Mrmr+ and Cfs+ feature selection algorithms for high-dimensional data

Adrian Pino Angulo,Kilho Shin

doi:10.1007/s10489-018-1381-1

Abstract

Feature selection is a central issue in machine learning and applied mathematics. Filter feature selection algorithms aim to solve the optimization problem of selecting a set of features that maximize the correlation feature-class and minimize the correlation feature-feature. Mrmr (Minimum Redundancy Maximum Relevance) and Cfs (Correlation-based Feature Selection) are one of the most well-known algorithms that can find an approximate solution to this optimization problem. However, as time passes, the availability of data becomes greater, which makes the feature selection process more challenging. In this paper, we propose two new versions of Mrmr and Cfs that output the same feature set as the original algorithms, but are considerably much faster. Our novel algorithms are based on the solution of the duplication and the redundancy problems intrinsic in the original algorithms. We applied our proposals to thirty datasets related to the field of microarray and cancer analysis. Experiments revealed that the proposed algorithms Mrmr+ and Cfs+ are on average fourteen and three times faster than the original algorithms respectively.

Full Text