Abstract

High dimensional biomedical data contain thousands of features, and accurate identification of the main features in these data can be used to classification related data. However, it is usually a large number of irrelevant or redundant features seriously influence classification accuracy. To solve this problem, a new feature selection algorithm based on redundant removal is proposed in this study. Firstly, two redundant criteria are determined by vertical relevance and horizontal relevance. Secondly, an approximate redundancy feature framework based on mutual information (MI) is defined to remove redundant and irrelevant features. Finally, to evaluate the effectiveness of our proposed method, contrast experiments based on the classic feature selection algorithm are conducted using (K-nearest neighbour) KNN classifiers, and the results show that our algorithm can effectively improve the classification accuracy.

Highlights

  • High dimensional data analysis (Tamaresis et al, 2014) is a very hot research area, especially in cancer data (Lee et al, 2013 ), or mental illness data (Jiang et al, 2017; Li et al, 2017)

  • Feature selection is considered to be an essential procedure in high dimension data processing

  • The major purpose of high dimensional data feature selection is to overcome the curse of dimensionality (Li et al, 2016; Zhang et al, 2018)

Read more

Summary

Introduction

High dimensional data analysis (Tamaresis et al, 2014) is a very hot research area, especially in cancer data (Lee et al, 2013 ), or mental illness data (Jiang et al, 2017; Li et al, 2017). High dimension data contain many weak relevance or irrelevance features. Hypothesis all the features are treated computational complexity and accuracy of the prediction can be seriously affected. Feature selection is considered to be an essential procedure in high dimension data processing. Feature selection (Saeys et al, 2007) refers to selecting relevant features while to remove irrelevant and redundant features. As one of the important part of knowledge discovery technology, feature selection can effectively improve the computing speed of subsequent prediction algorithm, enhance the compactness of the prediction model, increase the generalization ability of the corresponding model. The major purpose of high dimensional data feature selection is to overcome the curse of dimensionality (Li et al, 2016; Zhang et al, 2018)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call