Runs a cross validated xgboost and regularized linear regression, and reports accuracy metrics. Automatically determines whether the provided formula is a regression or classification.
auto_model_accuracy(
data,
formula,
...,
n_folds = 4,
as_flextable = TRUE,
include_linear = FALSE,
theme = "tron",
seed = 1,
mtry = 1,
trees = 15L,
min_n = 1L,
tree_depth = 6L,
learn_rate = 0.3,
loss_reduction = 0,
sample_size = 1,
stop_iter = 10L,
counts = FALSE,
penalty = 0.015,
mixture = 0.35
)
data frame
formula
any other params for xgboost
number of cross validation folds
if FALSE, returns a tibble
if TRUE includes a regularized linear model
make_flextable theme
seed
# Randomly Selected Predictors; defaults to .75; (xgboost: colsample_bynode) (type: numeric, range 0 - 1) (or type: integer if count = TRUE
)
# Trees (xgboost: nrounds) (type: integer, default: 500L)
Minimal Node Size (xgboost: min_child_weight) (type: integer, default: 2L); [typical range: 2-10] Keep small value for highly imbalanced class data where leaf nodes can have smaller size groups. Otherwise increase size to prevent overfitting outliers.
Tree Depth (xgboost: max_depth) (type: integer, default: 7L); Typical values: 3-10
Learning Rate (xgboost: eta) (type: double, default: 0.05); Typical values: 0.01-0.3
Minimum Loss Reduction (xgboost: gamma) (type: double, default: 1.0); range: 0 to Inf; typical value: 0 - 20 assuming low-mid tree depth
Proportion Observations Sampled (xgboost: subsample) (type: double, default: .75); Typical values: 0.5 - 1
# Iterations Before Stopping (xgboost: early_stop) (type: integer, default: 15L) only enabled if validation set is provided
if TRUE
specify mtry
as an integer number of cols. Default FALSE
to specify mtry
as fraction of cols from 0 to 1
linear regularization parameter
linear model parameter, combines l1 and l2 regularization
a table