Abstract

Non-parametric tests are often used in feature selection of gene expression data. In order to achieve feature selection and redundancy removal of gene expression data and obtain feature subsets with higher classification accuracy, this paper proposes a feature selection algorithm based on Jonckheere-Terpstra test for the characteristics of high noise and high redundancy in gene expression data, and applies the algorithm to both binary and multiclassified gene expression datasets. The algorithm first uses Jonckheere-Terpstra test to perform feature selection on the datasets to initially reduce the data dimension, and then uses Pearson correlation test to remove the redundancy of the selected data sets to obtain the final feature subset. In order to test its classification performance, support vector machine and k-nearest neighbor were used for classification experiments respectively. The experimental results show that the algorithm has high classification accuracy for both feature selection and redundancy removal of binary and multiclassified datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call