Feature selection algorithm for high dimensional biomedical data classification based on redundant removal

Bingtao Zhang,Yi Zhang,Xiaopeng Wang,Tao Lei,Bin Hu,Hanshu Cai,Chaochao Zhang,Zhe Li,Peng Cao,Zhidiao Qu

doi:10.14236/ewic/hci2018.230

Abstract

High dimensional biomedical data contain thousands of features, and accurate identification of the main features in these data can be used to classification related data. However, it is usually a large number of irrelevant or redundant features seriously influence classification accuracy. To solve this problem, a new feature selection algorithm based on redundant removal is proposed in this study. Firstly, two redundant criteria are determined by vertical relevance and horizontal relevance. Secondly, an approximate redundancy feature framework based on mutual information (MI) is defined to remove redundant and irrelevant features. Finally, to evaluate the effectiveness of our proposed method, contrast experiments based on the classic feature selection algorithm are conducted using (K-nearest neighbour) KNN classifiers, and the results show that our algorithm can effectively improve the classification accuracy.

Highlights

High dimensional data analysis (Tamaresis et al, 2014) is a very hot research area, especially in cancer data (Lee et al, 2013 ), or mental illness data (Jiang et al, 2017; Li et al, 2017)
Feature selection is considered to be an essential procedure in high dimension data processing
The major purpose of high dimensional data feature selection is to overcome the curse of dimensionality (Li et al, 2016; Zhang et al, 2018)

Summary

Introduction

High dimensional data analysis (Tamaresis et al, 2014) is a very hot research area, especially in cancer data (Lee et al, 2013 ), or mental illness data (Jiang et al, 2017; Li et al, 2017). High dimension data contain many weak relevance or irrelevance features. Hypothesis all the features are treated computational complexity and accuracy of the prediction can be seriously affected. Feature selection is considered to be an essential procedure in high dimension data processing. Feature selection (Saeys et al, 2007) refers to selecting relevant features while to remove irrelevant and redundant features. As one of the important part of knowledge discovery technology, feature selection can effectively improve the computing speed of subsequent prediction algorithm, enhance the compactness of the prediction model, increase the generalization ability of the corresponding model. The major purpose of high dimensional data feature selection is to overcome the curse of dimensionality (Li et al, 2016; Zhang et al, 2018)

Methods

Results

Conclusion