Abstract

This paper introduces an approach to classification of RNA-seq read count data using Gaussian process (GP) models. RNA-seq data are transformed into microarray-like data before applying the statistical two-sample t-test for gene selection. GP is designed as a classifier that takes discriminant genes selected by the t-test method as inputs. The proposed approach is verified by using two benchmark real datasets and the five-fold cross-validation strategy. Various performance metrics that include accuracy rate, F-measure, area under the ROC curve and mutual information are used to evaluate the classifiers. Experimental results show the significant dominance of the GP classifier against its competing methods including k-nearest neighbors, multilayer perceptron, support vector machine and ensemble learning AdaBoost. The proposed approach therefore can be implemented effectively in real practice for RNA-seq data analysis, which is useful in many applications related to disease diagnosis and monitoring at the molecular level.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call