Abstract

BackgroundIn the study of complex diseases using genome-wide expression data from clinical samples, a difficult case is the identification and mapping of the gene signatures associated to the stages that occur in the progression of a disease. The stages usually correspond to different subtypes or classes of the disease, and the difficulty to identify them often comes from patient heterogeneity and sample variability that can hide the biomedical relevant changes that characterize each stage, making standard differential analysis inadequate or inefficient.ResultsWe propose a methodology to study diseases or disease stages ordered in a sequential manner (e.g. from early stages with good prognosis to more acute or serious stages associated to poor prognosis). The methodology is applied to diseases that have been studied obtaining genome-wide expression profiling of cohorts of patients at different stages. The approach allows searching for consistent expression patterns along the progression of the disease through two major steps: (i) identifying genes with increasing or decreasing trends in the progression of the disease; (ii) clustering the increasing/decreasing gene expression patterns using an unsupervised approach to reveal whether there are consistent patterns and find genes altered at specific disease stages. The first step is carried out using Gamma rank correlation to identify genes whose expression correlates with a categorical variable that represents the stages of the disease. The second step is done using a Self Organizing Map (SOM) to cluster the genes according to their progressive profiles and identify specific patterns. Both steps are done after normalization of the genomic data to allow the integration of multiple independent datasets. In order to validate the results and evaluate their consistency and biological relevance, the methodology is applied to datasets of three different diseases: myelodysplastic syndrome, colorectal cancer and Alzheimer’s disease. A software script written in R, named genediseasePatterns, is provided to allow the use and application of the methodology.ConclusionThe method presented allows the analysis of the progression of complex and heterogeneous diseases that can be divided in pathological stages. It identifies gene groups whose expression patterns change along the advance of the disease, and it can be applied to different types of genomic data studying cohorts of patients in different states.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1290-4) contains supplementary material, which is available to authorized users.

Highlights

  • In the study of complex diseases using genome-wide expression data from clinical samples, a difficult case is the identification and mapping of the gene signatures associated to the stages that occur in the progression of a disease

  • The method is generalized to be applicable to the study of other diseases with stages, and here we illustrate its application to two other experimental cohorts of patients from Alzheimer’s disease (AD) and from colorectal cancer (CRC), where a clear clinical characterization of the individuals in stages has been done

  • Patterns found along the progression of diseases: three case studies In the first dataset studied, corresponding to myelodysplastic syndromes (MDS), we applied the methodology using 2 different ways of grouping the samples: by disease subtypes defined in MDS (6-stage contrast), or by risk of transformation into leukemia (4-stage contrast) (Fig. 1)

Read more

Summary

Introduction

In the study of complex diseases using genome-wide expression data from clinical samples, a difficult case is the identification and mapping of the gene signatures associated to the stages that occur in the progression of a disease. Rather than using differential expression analysis to look for specific markers for each subtype, our approach is based on a non-parametric coexpression profiling along the different stages of the disease followed by the application of a pattern recognition method This allows unravelling similarities and identify specific gene patterns associated to the stages or progression of the disease. The method is generalized to be applicable to the study of other diseases with stages, and here we illustrate its application to two other experimental cohorts of patients from Alzheimer’s disease (AD) and from colorectal cancer (CRC), where a clear clinical characterization of the individuals in stages has been done All these datasets have been produced with high-density microarray expression platforms; as a validation, we applied the methodology to a simulated RNA-seq dataset where a subset of genes have been modeled to follow progressive changes in several stages

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.