Research on Gene Expression Data Based on Jonckheere-Terpstra Test

Changyin Zhou,Qiuyu Xu

doi:10.1016/j.procs.2022.10.063

Abstract

Non-parametric tests are often used in feature selection of gene expression data. In order to achieve feature selection and redundancy removal of gene expression data and obtain feature subsets with higher classification accuracy, this paper proposes a feature selection algorithm based on Jonckheere-Terpstra test for the characteristics of high noise and high redundancy in gene expression data, and applies the algorithm to both binary and multiclassified gene expression datasets. The algorithm first uses Jonckheere-Terpstra test to perform feature selection on the datasets to initially reduce the data dimension, and then uses Pearson correlation test to remove the redundancy of the selected data sets to obtain the final feature subset. In order to test its classification performance, support vector machine and k-nearest neighbor were used for classification experiments respectively. The experimental results show that the algorithm has high classification accuracy for both feature selection and redundancy removal of binary and multiclassified datasets.

Full Text