Abstract

Feature selection is an important preprocessing step in high-dimensional regression and classification problems because it helps to avoid the effect of noise, redundant, and irrelevant features on model performance. A variety of methods for feature selection have been proposed in the literature. However, small perturbations in the training data may produce highly different feature subsets; this is known as instability. Evaluating the stability of feature selection approaches has grown in importance and popularity in recent years. This paper introduces a novel stability estimator for measuring the internal and external stability of features subsets chosen using various methods in random subsampling experiments. The proposed estimator evaluates the similarity of features within selected subset as well as measuring the variation with respect to the number of selected features between selected subsets in different subsampling experiments. Furthermore, the asymptotic normality of the proposed stability estimator for large number of subsamples is also established. Experiments are carried out on both simulated and real-world datasets; where results demonstrate the usefulness of the proposed stability estimator.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.