A novel features ranking metric with application to scalable visual and bioinformatics data classification

Quan Zou,Jiancang Zeng,Liujuan Cao,Rongrong Ji

doi:10.1016/j.neucom.2014.12.123

Abstract

Coming with the big data era, the filtering of uninformative data becomes emerging. To this end, ranking the high dimensionality features plays an important role. However, most of the state-of-art methods focus on improving the classification accuracy while the stability of the dimensionality reduction is simply ignored. In this paper, we proposed a Max-Relevance-Max-Distance (MRMD) feature ranking method, which balances accuracy and stability of feature ranking and prediction task. In order to prove the effectiveness on big data, we tested our method on two different datasets. The first one is image classification, which is a benchmark dataset with high dimensionality, while the second one is protein–protein interaction prediction data, which comes from our previous private research and has massive instances. Experiments prove that our method maintained the accuracy together with the stability on both two big datasets. Moreover, our method runs faster than other filtering and wrapping methods, such as mRMR and Information Gain.

Full Text