Abstract

Wrapper-based feature subset selection (FSS) methods tend to obtain better classification accuracy than filter methods but are considerably more time-consuming, particularly for applications that have thousands of features, such as microarray data analysis. Accelerating this process without degrading its high accuracy would be of great value for gene expression analysis. In this study, we explored how to reduce the time complexity of wrapper-based FSS with an embedded K-Nearest-Neighbor (KNN) classifier. Instead of considering KNN as a black box, we proposed to construct a classifier distance matrix and incrementally update the matrix to accelerate the calculation of the relevance criteria in evaluating the quality of the candidate features. Extensive experiments on eight publicly available microarray datasets were first conducted to demonstrate the effectiveness of the wrapper methods with KNN for selecting informative features. To demonstrate the performance gain in terms of time cost reduction, we then conducted experiments on the eight microarray datasets with the embedded KNN classifiers and analyzed the theoretical time/space complexity. Both the experimental results and theoretical analysis demonstrated that the proposed approach markedly accelerates the wrapper-based feature selection process without degrading the high classification accuracy, and the space complexity analysis indicated that the additional space overhead is affordable in practice.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call