Abstract

Gene expression profiling has gradually become a routine procedure for disease diagnosis and classification. In the past decade, many computational methods have been proposed, resulting in great improvements on various levels, including feature selection and algorithms for classification and clustering. In this study, we present iPcc, a novel method from the feature extraction perspective to further propel gene expression profiling technologies from bench to bedside. We define ‘correlation feature space’ for samples based on the gene expression profiles by iterative employment of Pearson’s correlation coefficient. Numerical experiments on both simulated and real gene expression data sets demonstrate that iPcc can greatly highlight the latent patterns underlying noisy gene expression data and thus greatly improve the robustness and accuracy of the algorithms currently available for disease diagnosis and classification based on gene expression profiles.

Highlights

  • With the rapid development of high-throughput technologies, gene expression profiling based on microarrays or next-generation sequencing techniques have been widely applied in clinical research [1,2,3,4,5,6,7,8,9]

  • We proposed a novel feature extraction method, named iPcc, to extract the underlying patterns from noisy data sets through introducing the ‘correlation feature’ concept with iterative Pearson correlation coefficients

  • Simulations and evaluations on real data sets demonstrate that iPcc greatly improves the disease class discovery and prediction based on the gene expression profiles

Read more

Summary

Introduction

With the rapid development of high-throughput technologies, gene expression profiling based on microarrays or next-generation sequencing techniques have been widely applied in clinical research [1,2,3,4,5,6,7,8,9]. The big advantage of simultaneously measuring the expression levels of thousands of genes facilitates informative and accurate disease diagnosis and classification. It incorporates many irrelevant genes, producing a feature vector with extremely high dimensionality [10,11]. The rapid development of high-throughput technologies produces more and more information about samples, allowing unbiased investigation of the molecular truth of various biomedical phenomena. RNA-Seq (deep sequencing the transcriptomes of samples) can detect expression levels of novel genes that are not annotated in the reference genomes, compared with the traditional gene expression profiling microarrays [8]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call