tidybins • tidybins



library(tidybins)
suppressPackageStartupMessages(library(dplyr))

Bin Value

Binning by value is the only original binning method implemented in this package. It is inspired by the case in marketing when accounts need to be binned by their sales. For example, creating 10 bins, where each bin represent 10% of all market sales. The first bin contains the highest sales accounts, thus has the small total number of accounts, whereas the last bin contains the smallest sales accounts, thus requiring the most number of accounts per bin to reach 10% of the market sales.


tibble::tibble(SALES = as.integer(rnorm(1000L, mean = 10000L, sd = 3000))) -> sales_data

sales_data %>% 
  bin_cols(SALES, bin_type = "value") -> sales_data1

sales_data1
#> # A tibble: 1,000 × 2
#>    SALES SALES_va10
#>    <int>      <int>
#>  1  7329          2
#>  2 10518          5
#>  3 13078          8
#>  4  7592          2
#>  5  9253          4
#>  6 13724          9
#>  7 16802         10
#>  8 13332          8
#>  9 11916          7
#> 10 10838          5
#> # … with 990 more rows

Notice that the sum is equal across bins.

sales_data1 %>% 
  bin_summary() %>% 
  print(width = Inf)
#> # A tibble: 10 × 14
#>    column method      n_bins .rank  .min  .mean  .max .count .uniques
#>    <chr>  <chr>        <int> <int> <int>  <dbl> <int>  <int>    <int>
#>  1 SALES  equal value     10    10 15016 16322. 18895     62       62
#>  2 SALES  equal value     10     9 13428 14067. 15010     70       69
#>  3 SALES  equal value     10     8 12453 12935. 13427     78       74
#>  4 SALES  equal value     10     7 11554 11987. 12445     83       77
#>  5 SALES  equal value     10     6 10855 11180. 11544     89       85
#>  6 SALES  equal value     10     5 10083 10447. 10849     96       92
#>  7 SALES  equal value     10     4  9161  9641. 10072    103       97
#>  8 SALES  equal value     10     3  8264  8708.  9152    115      109
#>  9 SALES  equal value     10     2  7216  7763.  8257    128      122
#> 10 SALES  equal value     10     1    55  5652.  7189    176      170
#>    relative_value    .sum   .med   .sd width
#>             <dbl>   <int>  <dbl> <dbl> <int>
#>  1          100   1011944 16006.  976.  3879
#>  2           86.2  984694 13944.  484.  1582
#>  3           79.3 1008951 12955   299.   974
#>  4           73.4  994957 11963   264.   891
#>  5           68.5  994996 11157   187.   689
#>  6           64.0 1002907 10448   230.   766
#>  7           59.1  993053  9625   262.   911
#>  8           53.4 1001436  8683   260.   888
#>  9           47.6  993665  7743   305.  1041
#> 10           34.6  994815  6094. 1421.  7134