Abstract

Feature selection is the key step in the analysis of high-dimensional small sample data. The core of feature selection is to analyse and quantify the correlation between features and class labels and the redundancy between features. However, most of the existing feature selection algorithms only consider the classification contribution of individual features and ignore the influence of interfeature redundancy and correlation. Therefore, this paper proposes a feature selection algorithm for nonlinear dynamic conditional relevance (NDCRFS) through the study and analysis of the existing feature selection algorithm ideas and method. Firstly, redundancy and relevance between features and between features and class labels are discriminated by mutual information, conditional mutual information, and interactive mutual information. Secondly, the selected features and candidate features are dynamically weighted utilizing information gain factors. Finally, to evaluate the performance of this feature selection algorithm, NDCRFS was validated against 6 other feature selection algorithms on three classifiers, using 12 different data sets, for variability and classification metrics between the different algorithms. The experimental results show that the NDCRFS method can improve the quality of the feature subsets and obtain better classification results.

Highlights

  • In the era of big data, the number of dimensions of small sample data has increased dramatically, leading to dimensional disasters

  • Feature selection algorithms in information theory are further divided into mutual information metrics [17, 18], conditional mutual information metrics [1, 19], interactive mutual information metrics [20,21,22], and so on. ese methods only determine

  • NDCRFS was compared with six feature selection algorithms, MIM, IG-RFE, IWFS, CMIM, DWFS, and CIFE, to verify its effectiveness

Read more

Summary

Introduction

In the era of big data, the number of dimensions of small sample data has increased dramatically, leading to dimensional disasters. Because there are a lot of irrelevant and redundant features in high-dimensional data, these features lead to higher computational complexity and reduce the accuracy and efficiency of classification methods. Feature selection methods can be classified into three types: filter methods [10, 11], wrapper methods [12], and embedded methods [13] Due to their high computational efficiency and generality, filter methods are applied to ultra-high-dimensional data sets. E filter feature selection methods can be classified into rough set [14], statistics-based [15], and information-based [16] according to different metrics. Feature selection algorithms in information theory are further divided into mutual information metrics [17, 18], conditional mutual information metrics [1, 19], interactive mutual information metrics [20,21,22], and so on. ese methods only determine

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call