RHDSI: A novel dimensionality reduction based algorithm on high dimensional feature selection with interactions

Rahi Jain,Wei Xu

doi:10.1016/j.ins.2021.06.096

Abstract

Classical statistical learning techniques struggle to perform feature selection in high-dimensional data that includes interaction effects i.e., when independent feature/s influence the effect of another feature on study outcome. Methods like penalized regression and sparse partial least squares regression can help, but penalization restricts the handling of interaction terms. This study proposes a novel Dimensionality Reduction based algorithm on High Dimensional feature Selection with Interactions (RHDSI), a new feature selection method that integrates dimensionality reduction and machine learning. The method can handle high-dimensional data, incorporate interaction terms and perform statistically-interpretable feature selection; and enables existing classical statistical techniques to work on high-dimensional data. RHDSI performs feature selection in three steps. The first step is a coarse feature selection through dimensionality reduction and statistical modeling on multiple resampled datasets and features, along with their interaction terms. The second step uses pooled results for unsupervised statistical learning-based feature refinement. Finally, supervised statistical learning-based feature selection is performed on the refined feature set to identify the final features with interactions. We evaluate the performance of this algorithm on simulated data and real studies. RHDSI shows better or par performance compared to standard feature selection algorithms like LASSO, subset selection, and sparse PLS.

Full Text