English

David Dernoncourt ,Blaise Hanczar ,Jean-Daniel Zucker

doi:10.5220/0004922203250330

Abstract

Feature selection is an important step when building a classifier. However, the feature selection tends to be unstable on high-dimension and small-sample size data. This instability reduces the usefulness of selected features for knowledge discovery: if the selected feature subset is not robust, domain experts can have little trust that they are relevant. A growing number of studies deal with feature selection stability. Based on the idea that ensemble methods are commonly used to improve classifiers accuracy and stability, some works focused on the stability of ensemble feature selection methods. So far, they obtained mixed results, and as far as we know no study extensively studied how the choice of the aggregation method influences the stability of ensemble feature selection. This is what we study in this preliminary work. We first present some aggregation methods, then we study the stability of ensemble feature selection based on them, on both artificial and real data, as well as the resulting classification performance.

Full Text