New Feature Selection Algorithm Based on Feature Stability and Correlation

Luai Al-Shalabi

doi:10.1109/access.2022.3140209

Abstract

The analysis of a large amount of data with high dimensionality of rows and columns increases the load of machine learning algorithms. Such data are likely to have noise and consequently, obstruct the performance of machine learning algorithms. Feature selection (FS) is one of the most essential machine learning techniques that can solve the above-mentioned problem. It tries to identify and eliminate irrelevant information as much as possible and only maintain a minimum subset of appropriate features. It plays an important role in improving the accuracy of machine-learning algorithms. It also reduces computational complexity, run time, storage, and cost. In this paper, a new feature selection algorithm based on feature stability and correlation is proposed to select the effective minimum subset of appropriate features. The efficiency of the proposed algorithm was evaluated by comparing it with other state-of-the-art dimensionality reduction (DR) algorithms using benchmark datasets. The evaluation criteria included the size of the minimum subset, the classification accuracy, the F-measure, and the area under curve (AUC). The results showed that the proposed algorithm is the pioneer in reducing a given dataset with high predictive accuracy.

Full Text