Wraps KMeans_rcpp to create a column that is a cluster formed from select columns in the data frame. Clusters names are specified by capital letters.

add_clusters(.data, ..., n_clusters = 4, cluster_name = "cluster")

Arguments

.data

dataframe

...

columns to cluster (tidyselect)

n_clusters

integer

cluster_name

column name

Value

data frame

Examples


iris %>%
tibble::as_tibble() %>%
add_clusters(Sepal.Width, Sepal.Length, n_clusters = 3, cluster_name = "Sepal_Cluster") -> iris1

iris1
#> # A tibble: 150 × 6
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal_Cluster
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>   <fct>        
#>  1          5.1         3.5          1.4         0.2 setosa  A            
#>  2          4.9         3            1.4         0.2 setosa  A            
#>  3          4.7         3.2          1.3         0.2 setosa  A            
#>  4          4.6         3.1          1.5         0.2 setosa  A            
#>  5          5           3.6          1.4         0.2 setosa  A            
#>  6          5.4         3.9          1.7         0.4 setosa  A            
#>  7          4.6         3.4          1.4         0.3 setosa  A            
#>  8          5           3.4          1.5         0.2 setosa  A            
#>  9          4.4         2.9          1.4         0.2 setosa  A            
#> 10          4.9         3.1          1.5         0.1 setosa  A            
#> # … with 140 more rows

iris1 %>%
numeric_summary(original_col = Sepal.Width, bucket_col = Sepal_Cluster)
#> # A tibble: 3 × 11
#>   Sepal_Cluster  .min .mean  .max .count .uniques relative_value  .sum  .med
#>   <fct>         <dbl> <dbl> <dbl>  <int>    <int>          <dbl> <dbl> <dbl>
#> 1 C               2.5  3.07   3.8     47       12           89.7  144.   3  
#> 2 B               2    2.69   3.4     53       12           78.5  143.   2.7
#> 3 A               2.3  3.43   4.4     50       16          100    171.   3.4
#> # … with 2 more variables: .sd <dbl>, width <dbl>