“Wrapper” approaches can be viewed as built-in functions to optimize the number of predictors in the optimization or regression problem. Many feature selection routines use a “wrapper” approach to find appropriate variables such that an algorithm searching through feature space repeatedly fits the model with different predictor sets. The best predictor set is determined by some measure of performance (correlation R^2, root-mean-square deviation). An example of one search routine is backwards selection (a.k.a. recursive feature elimination). In many cases, using these models with built-in feature selection will be more efficient than algorithms where the search routine for the right predictors is external to the model. Built-in feature selection typically couples the predictor search algorithm with parameter estimation, and is usually optimized with a single objective function (e.g. error rates or likelihood).
The algorithm fits the model to all predictors. Each predictor is ranked according to relevance to the model. With each iteration of feature
selection, the Ci top-ranked predictors are retained, the model is refit and performance is re-assessed. Built-in backward selection is being used for at least three purposes: predictor selection, model fitting and performance evaluation. Unless the number of samples is large, especially in relation to the number of variables, one static training set may not be able to fulfill these needs.
The “crantastic” package caret contains functions for training and plotting classification and regression models. In this case, the rfe function is used to obtain the potential selection. It has several arguments:
- x, a matrix or data frame of predictor variables
- y, a vector (numeric or factor) of outcomes
- sizes, an integer vector for the specific subset sizes that should be tested (which must not include ncol(x))
- rfeControl, a list of options that can be used to specify the model and the methods for prediction, ranking etc.