Abstract

This paper introduces an approach to classification of RNA-seq read counts using grey relational analysis (GRA) and Bayesian Gaussian process (GP) models. Read counts are transformed to microarray-like data to facilitate normal-based statistical methods. GRA is designed to select differentially expressed genes by integrating outcomes of five individual feature selection methods including two-sample t-test, entropy test, Bhattacharyya distance, Wilcoxon test and receiver operating characteristic curve. GRA performs as an aggregate filter method through combining advantages of the individual methods to produce significant feature subsets that are then fed into a nonparametric GP model for classification. The proposed approach is verified by using two benchmark real datasets and the five-fold cross-validation method. Experimental results show the performance dominance of the GRA-based feature selection method as well as GP classifier against their competing methods. Moreover, the results demonstrate that GRA-GP considerably dominates the sparse Poisson linear discriminant analysis classifiers, which were introduced specifically for read counts, on different number of features. The proposed approach therefore can be implemented effectively in real practice for read count data analysis, which is useful in many applications including understanding disease pathogenesis, diagnosis and treatment monitoring at the molecular level.

Highlights

  • Discovery of genes that are differentially expressed is helpful in gaining insights into disease pathogenesis, and discovering biomarkers for diagnosing and predicting the clinical status of patients

  • We introduce an aggregate feature selection method based on the grey relational analysis (GRA) technique [23] to deal with transformed RNA sequencing (RNA-seq) data

  • GRA-based gene selection is employed for RNA-seq data to select genes that are differentially expressed for classification

Read more

Summary

Introduction

Discovery of genes that are differentially expressed is helpful in gaining insights into disease pathogenesis, and discovering biomarkers for diagnosing and predicting the clinical status of patients. Identifying gene biomarkers is often performed using DNA microarray, which measures gene expression of the entire human genome. DNA microarray technology suffers from the cross-hybridization procedure that yields noisy gene expression profiles. RNA sequencing (RNA-seq) has been emerging as a favorite method against the microarray technology [1]. RNA-seq is a technique that is capable of generating RNA-seq count data based on the PLOS ONE | DOI:10.1371/journal.pone.0164766. RNA-seq is a technique that is capable of generating RNA-seq count data based on the PLOS ONE | DOI:10.1371/journal.pone.0164766 October 26, 2016

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call