Robust ensemble feature selection for high dimensional data sets

Afef Ben Brahim,Mohamed Limam

doi:10.1109/hpcsim.2013.6641406

Abstract

Feature selection is an important and frequently used technique in data preprocessing for performing data mining on large scale data sets. Several feature selection methods exist in the literature, each of them uses a specific feature evaluation criterion and may produce different feature subsets even when applied to the same data set. There is not a better resulting subset than the others but all the obtained subsets are the best subsets among the whole feature space. Thinking of a way to take advantage of different feature selection methods simultaneously is a challenging data mining problem. Recently, ensemble feature selection concept have been introduced to help solve this problem. Multiple feature selections are combined in order to produce more robust feature subsets and better classification results. However, one of the most critical decisions when performing ensemble feature selection is the aggregation technique to use for combining the resulting feature lists from the multiple algorithms into a single decision for each feature. In this paper, we propose a robust feature aggregation technique to combine the results of three different filter methods. Our aggregation technique is based on measuring feature algorithms confidence and conflict with the other ones in order to assign a reliability factor guiding the final feature selection. Experiments on high dimensional data sets show that the proposed approach outperforms the single feature selection algorithms as well as two well known aggregation methods in terms of classification performance.

Full Text