Advanced Feature Selection in Linear Models

"There is nothing permanent except change."
– Heraclitus

So far, we've examined the usage of linear models for both quantitative and qualitative outcomes with an eye on the techniques of feature selection, that is, the methods and techniques that exclude useless or unwanted predictor variables. We saw that linear models can be quite useful in machine learning problems, how piece-wise linear models can capture non-linear relationships as multivariate adaptive regression splines. Additional techniques have been developed and refined in the last couple of decades that can improve predictive ability and interpretability above and beyond the linear models that we discussed in the preceding chapters. In this day and age, many datasets, such as those in the two prior chapters, have numerous features. It isn't unreasonable to have datasets with thousands of potential features. 

The methods in this chapter might prove to be a better way to approach feature reduction and selection. In this chapter, we'll look at the concept of regularization where the coefficients are constrained or shrunk towards zero. There're many methods and permutations to these methods of regularization, but we'll focus on ridge regression, Least Absolute Shrinkage and Selection Operator (LASSO), and, finally, elastic net, which combines the benefits of both techniques into one.

The following are the topics we'll cover in this chapter:

  • Overview of regularization
  • Dataset creation
  • Ridge regression
  • LASSO
  • Elastic net