Abstract

Coming with the big data era, the filtering of uninformative data becomes emerging. To this end, ranking the high dimensionality features plays an important role. However, most of the state-of-art methods focus on improving the classification accuracy while the stability of the dimensionality reduction is simply ignored. In this paper, we proposed a Max-Relevance-Max-Distance (MRMD) feature ranking method, which balances accuracy and stability of feature ranking and prediction task. In order to prove the effectiveness on big data, we tested our method on two different datasets. The first one is image classification, which is a benchmark dataset with high dimensionality, while the second one is protein–protein interaction prediction data, which comes from our previous private research and has massive instances. Experiments prove that our method maintained the accuracy together with the stability on both two big datasets. Moreover, our method runs faster than other filtering and wrapping methods, such as mRMR and Information Gain.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.