Abstract
High dimensional biomedical data contain thousands of features, and accurate identification of the main features in these data can be used to classification related data. However, it is usually a large number of irrelevant or redundant features seriously influence classification accuracy. To solve this problem, a new feature selection algorithm based on redundant removal is proposed in this study. Firstly, two redundant criteria are determined by vertical relevance and horizontal relevance. Secondly, an approximate redundancy feature framework based on mutual information (MI) is defined to remove redundant and irrelevant features. Finally, to evaluate the effectiveness of our proposed method, contrast experiments based on the classic feature selection algorithm are conducted using (K-nearest neighbour) KNN classifiers, and the results show that our algorithm can effectively improve the classification accuracy.
Highlights
High dimensional data analysis (Tamaresis et al, 2014) is a very hot research area, especially in cancer data (Lee et al, 2013 ), or mental illness data (Jiang et al, 2017; Li et al, 2017)
Feature selection is considered to be an essential procedure in high dimension data processing
The major purpose of high dimensional data feature selection is to overcome the curse of dimensionality (Li et al, 2016; Zhang et al, 2018)
Summary
High dimensional data analysis (Tamaresis et al, 2014) is a very hot research area, especially in cancer data (Lee et al, 2013 ), or mental illness data (Jiang et al, 2017; Li et al, 2017). High dimension data contain many weak relevance or irrelevance features. Hypothesis all the features are treated computational complexity and accuracy of the prediction can be seriously affected. Feature selection is considered to be an essential procedure in high dimension data processing. Feature selection (Saeys et al, 2007) refers to selecting relevant features while to remove irrelevant and redundant features. As one of the important part of knowledge discovery technology, feature selection can effectively improve the computing speed of subsequent prediction algorithm, enhance the compactness of the prediction model, increase the generalization ability of the corresponding model. The major purpose of high dimensional data feature selection is to overcome the curse of dimensionality (Li et al, 2016; Zhang et al, 2018)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have