Automatic evaluation of different forms (functions) of predictors to obtain the best Generalized Additive Model (GAM).

autoGAM_frame(
  formula,
  resp.base = NULL,
  forms = list("identity", "logb", "exp", power = 2:3, poly = 2:3),
  data,
  ignore.outliers = F,
  family = gaussian(link = "identity"),
  metric = "AIC",
  raw.poly = F,
  interval.alpha = 0.05,
  parallel = F,
  core.nums = NULL
)

Arguments

formula

A simple formula object with one response and predictor(s).

resp.base

Base level of the binary response variable. Default is NULL.

forms

A named list of continuous predictor(s) form(s) with their respected degree(s) or degree(s) of freedom. Famous functions that one can think of are identity(), log(), log2(), logb(), exp(), bs() & ns() (from splines package), s() (from gam package) and cut(). Default is list('identity','logb','exp','power'=2:3,'poly'=2:3). The names of the list are the function(s) names and their respected value(s) are their degree(s)/degree(s) of freedom. Only functions with the form: f(x) or f(x,degree) with a vector or a matrix as their output should be included in this list. Note: Functions with the f(x) form can be passed in 2 ways: 1. 'f'=c() or 2. 'f'. Any function of the form f(vector,degree) should be passed as 'f'=desired degree(s). If a function of the form f(vector,degree) is passed as a single character, the default value for degree will be used and if there's no default value for degree, an error will occur.

data

Dataset containing the response and all the predictors that were included in the formula.

ignore.outliers

Logical indicating whether outliers should be ignored during the evaluation process of predictors forms or not. Default is FALSE. When TRUE, outliers in case of response are detected by car::outlierTest() which is based on studentized residuals of records and outliers in case of the predictor are determined as records that have hat-values > 2p (where p is the number of parameters inside the model).

family

Family for the response variable in model fits. Default is gaussian(link='identity').

metric

The name of the metric to be used for evaluation of GLMs performances. Valid values are 'AIC', 'BIC' & 'AICc'. Default is 'AIC'.

raw.poly

Logical indicating whether raw forms of polynomials should be included when polynomial forms ('poly') are being evaluated. Default is FALSE.

interval.alpha

Numerical value of alpha for the creation of confidence intervals of predictions. Default value is 0.05.

parallel

Logical indicating whether the evaluation process of different forms must be done in parallel mode or not. Default is FALSE.

core.nums

Number of cores to be used in parallelization process. The default is NULL where automatically half of CPU cores will be used unless user specifies its value.

Value

A comprehensive list containing information of the whole evaluation process. By default, autoGAM shows final best forms of continuous predictors and the final set of categorical variables but they are part of a bigger list. You can access all items of the list via:

  1. $data: Dataset that was used to create different models in the evaluation process.

  2. $`forms info`: A nested data frame including full information of evaluation process. It includes values and predictions for all form(s) on all predictor(s).

  3. $`best forms`: Final best form of continuous predictors that were obtained from the evaluation process.

  4. $`final predictors`: Final predictors (best form of continuous predictors and categorical predictors) that are included in the best GAM model. If the backward argument was set to FALSE, best forms of continuous predictors (and possibly the categorical variables) are returned.

  5. $`response family`: The family of the model's response. This item is for internal use!

See also

Author

Shahin Roshani

Examples

if (FALSE) autoGAM_frame(mpg~disp+drat+vs,data=mtcars %>% mutate_at('vs',as.factor))