Improving the Performance of Feature Selection Methods with Low-Sample-Size Data

Wanwan Zheng,Mingzhe Jin

doi:10.1093/comjnl/bxac033

Abstract

Abstract Feature selection refers to a critical preprocessing of machine learning to remove irrelevant and redundant data. According to feature selection methods, sufficient samples are usually required to select a reliable feature subset, especially considering the presence of outliers. However, sufficient samples cannot always be ensured in several real-world applications (e.g. neuroscience, bioinformatics and psychology). This study proposed a method to improve the performance of feature selection methods with ultra low-sample-size data, which is named feature selection based on data quality and variable training samples (QVT). Given that none of feature selection methods can perform optimally in all scenarios, QVT is primarily characterized by its versatility, because it can be implemented in any feature selection method. Furthermore, compared to the existing methods which tried to extract a stable feature subset for low-sample-size data by increasing the sample size or using more complicated algorithm, QVT tried to get improvement using the original data. An experiment was performed using 20 benchmark datasets, three feature selection methods and three classifiers to verify the feasibility of QVT; the results showed that using features selected by QVT is capable of achieving higher classification accuracy than using the explicit feature selection method, and significant differences exist.

Full Text