Abstract

A common research area in statistical machine learning has been variable selection in high dimensional settings. In recent years, numerous effective approaches have been created to deal with these challenges. In order to improve the prediction accuracy of the model for the given dataset, this study sought to present a double approach variable selection method when pairwise interactions between the explanatory variables exist and to choose the smallest explanatory variable set (considering interactions among them). In this study, a double step method consolidating Random Forest and Adaptive Elastic Net was further examined to mimic potential health effects of environmental contamination. When there were existing interactions in the data or none at all, the double step approach was compared to the single-step adaptive elastic net method and two-step CART paired with the adaptive elastic net method. Using significant statistical tests like RMSE, R2 , and the quantity of the variable chosen for the final model, the success of the strategies was measured. The double step RF+AENET approach produces a simple, constrained model. Despite the complex association between exposure variables, it has the lowest false detection rate for null interactions. A set of variables that have correlation with the result are effectively retained by the screening and variable reduction processes in the RF step of the RF+AENET approach. The double step RF+AENET performs prediction better than a single technique and chooses a sparse model that is close to the true model. Thus, it can be said that when there are pairwise interactions between variables in the simulated biological dataset, the double step technique is a better method for model prediction and parameter estimation. Keywords: Adaptive Elastic Net, Random Forest, Variable Selection, CART.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call