Wraps KMeans_rcpp
to create a column that is a cluster formed from select columns in the data frame.
Clusters names are specified by capital letters.
add_clusters(.data, ..., n_clusters = 4, cluster_name = "cluster")
dataframe
columns to cluster (tidyselect)
integer
column name
data frame
iris %>%
tibble::as_tibble() %>%
add_clusters(Sepal.Width, Sepal.Length, n_clusters = 3, cluster_name = "Sepal_Cluster") -> iris1
iris1
#> # A tibble: 150 × 6
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal_Cluster
#> <dbl> <dbl> <dbl> <dbl> <fct> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa A
#> 2 4.9 3 1.4 0.2 setosa A
#> 3 4.7 3.2 1.3 0.2 setosa A
#> 4 4.6 3.1 1.5 0.2 setosa A
#> 5 5 3.6 1.4 0.2 setosa A
#> 6 5.4 3.9 1.7 0.4 setosa A
#> 7 4.6 3.4 1.4 0.3 setosa A
#> 8 5 3.4 1.5 0.2 setosa A
#> 9 4.4 2.9 1.4 0.2 setosa A
#> 10 4.9 3.1 1.5 0.1 setosa A
#> # … with 140 more rows
iris1 %>%
numeric_summary(original_col = Sepal.Width, bucket_col = Sepal_Cluster)
#> # A tibble: 3 × 11
#> Sepal_Cluster .min .mean .max .count .uniques relative_value .sum .med
#> <fct> <dbl> <dbl> <dbl> <int> <int> <dbl> <dbl> <dbl>
#> 1 C 2.5 3.07 3.8 47 12 89.7 144. 3
#> 2 B 2 2.69 3.4 53 12 78.5 143. 2.7
#> 3 A 2.3 3.43 4.4 50 16 100 171. 3.4
#> # … with 2 more variables: .sd <dbl>, width <dbl>