Binning by value is the only original binning method implemented in this package. It is inspired by the case in marketing when accounts need to be binned by their sales. For example, creating 10 bins, where each bin represent 10% of all market sales. The first bin contains the highest sales accounts, thus has the small total number of accounts, whereas the last bin contains the smallest sales accounts, thus requiring the most number of accounts per bin to reach 10% of the market sales.
tibble::tibble(SALES = as.integer(rnorm(1000L, mean = 10000L, sd = 3000))) -> sales_data
sales_data %>%
bin_cols(SALES, bin_type = "value") -> sales_data1
sales_data1
#> # A tibble: 1,000 × 2
#> SALES SALES_va10
#> <int> <int>
#> 1 7329 2
#> 2 10518 5
#> 3 13078 8
#> 4 7592 2
#> 5 9253 4
#> 6 13724 9
#> 7 16802 10
#> 8 13332 8
#> 9 11916 7
#> 10 10838 5
#> # … with 990 more rows
Notice that the sum is equal across bins.
sales_data1 %>%
bin_summary() %>%
print(width = Inf)
#> # A tibble: 10 × 14
#> column method n_bins .rank .min .mean .max .count .uniques
#> <chr> <chr> <int> <int> <int> <dbl> <int> <int> <int>
#> 1 SALES equal value 10 10 15016 16322. 18895 62 62
#> 2 SALES equal value 10 9 13428 14067. 15010 70 69
#> 3 SALES equal value 10 8 12453 12935. 13427 78 74
#> 4 SALES equal value 10 7 11554 11987. 12445 83 77
#> 5 SALES equal value 10 6 10855 11180. 11544 89 85
#> 6 SALES equal value 10 5 10083 10447. 10849 96 92
#> 7 SALES equal value 10 4 9161 9641. 10072 103 97
#> 8 SALES equal value 10 3 8264 8708. 9152 115 109
#> 9 SALES equal value 10 2 7216 7763. 8257 128 122
#> 10 SALES equal value 10 1 55 5652. 7189 176 170
#> relative_value .sum .med .sd width
#> <dbl> <int> <dbl> <dbl> <int>
#> 1 100 1011944 16006. 976. 3879
#> 2 86.2 984694 13944. 484. 1582
#> 3 79.3 1008951 12955 299. 974
#> 4 73.4 994957 11963 264. 891
#> 5 68.5 994996 11157 187. 689
#> 6 64.0 1002907 10448 230. 766
#> 7 59.1 993053 9625 262. 911
#> 8 53.4 1001436 8683 260. 888
#> 9 47.6 993665 7743 305. 1041
#> 10 34.6 994815 6094. 1421. 7134