auto model accuracy — auto_model_accuracy • autostats

Runs a cross validated xgboost and regularized linear regression, and reports accuracy metrics. Automatically determines whether the provided formula is a regression or classification.

auto_model_accuracy(
  data,
  formula,
  ...,
  n_folds = 4,
  as_flextable = TRUE,
  include_linear = FALSE,
  theme = "tron",
  seed = 1,
  mtry = 1,
  trees = 15L,
  min_n = 1L,
  tree_depth = 6L,
  learn_rate = 0.3,
  loss_reduction = 0,
  sample_size = 1,
  stop_iter = 10L,
  counts = FALSE,
  penalty = 0.015,
  mixture = 0.35
)

Arguments

data: data frame
formula: formula
...: any other params for xgboost
n_folds: number of cross validation folds
as_flextable: if FALSE, returns a tibble
include_linear: if TRUE includes a regularized linear model
theme: make_flextable theme
seed: seed
mtry: # Randomly Selected Predictors; defaults to .75; (xgboost: colsample_bynode) (type: numeric, range 0 - 1) (or type: integer if count = TRUE)
trees: # Trees (xgboost: nrounds) (type: integer, default: 500L)
min_n: Minimal Node Size (xgboost: min_child_weight) (type: integer, default: 2L); [typical range: 2-10] Keep small value for highly imbalanced class data where leaf nodes can have smaller size groups. Otherwise increase size to prevent overfitting outliers.
tree_depth: Tree Depth (xgboost: max_depth) (type: integer, default: 7L); Typical values: 3-10
learn_rate: Learning Rate (xgboost: eta) (type: double, default: 0.05); Typical values: 0.01-0.3
loss_reduction: Minimum Loss Reduction (xgboost: gamma) (type: double, default: 1.0); range: 0 to Inf; typical value: 0 - 20 assuming low-mid tree depth
sample_size: Proportion Observations Sampled (xgboost: subsample) (type: double, default: .75); Typical values: 0.5 - 1
stop_iter: # Iterations Before Stopping (xgboost: early_stop) (type: integer, default: 15L) only enabled if validation set is provided
counts: if TRUE specify mtry as an integer number of cols. Default FALSE to specify mtry as fraction of cols from 0 to 1
penalty: linear regularization parameter
mixture: linear model parameter, combines l1 and l2 regularization

Value

a table