adapted from the dummy_cols
function Added the option to truncate the dummy column
names, and to specify dummy cols using tidyselect.
create_dummies(
.data,
...,
append_col_name = TRUE,
max_levels = 10L,
remove_first_dummy = FALSE,
remove_most_frequent_dummy = FALSE,
clean_names = TRUE,
ignore_na = FALSE,
split = NULL,
remove_selected_columns = TRUE
)
data frame
tidyselect columns. default selection is all character or factor variables
logical, default TRUE. Appends original column name to dummy col name
uses fct_lump_n
to limit the number of categories. Only the top n levels are preserved, and the rest being lumped into "other". Default is set to 10 levels, to prevent accidental overload. Set value to Inf
to use all levels
logical, default FALSE.
logical, default FALSE
logical, default TRUE. apply clean_names
logical, default FALSE
NULL
logical, default TRUE
data frame
reference the fastDummies package for documentation on the original function.
iris %>%
create_dummies(Species, append_col_name = FALSE) %>%
tibble::as_tibble()
#> 1 column(s) have become 3 dummy columns
#> # A tibble: 150 × 7
#> Sepal.Length Sepal.Width Petal.Length Petal.Width setosa versicolor virginica
#> <dbl> <dbl> <dbl> <dbl> <int> <int> <int>
#> 1 5.1 3.5 1.4 0.2 1 0 0
#> 2 4.9 3 1.4 0.2 1 0 0
#> 3 4.7 3.2 1.3 0.2 1 0 0
#> 4 4.6 3.1 1.5 0.2 1 0 0
#> 5 5 3.6 1.4 0.2 1 0 0
#> 6 5.4 3.9 1.7 0.4 1 0 0
#> 7 4.6 3.4 1.4 0.3 1 0 0
#> 8 5 3.4 1.5 0.2 1 0 0
#> 9 4.4 2.9 1.4 0.2 1 0 0
#> 10 4.9 3.1 1.5 0.1 1 0 0
#> # ℹ 140 more rows