Abstract

BackgroundThe increased multi-omics information on carefully phenotyped patients in studies of complex diseases requires novel methods for data integration. Unlike continuous intensity measurements from most omics data sets, phenome data contain clinical variables that are binary, ordinal and categorical.ResultsIn this paper we introduce an integrative phenotyping framework (iPF) for disease subtype discovery. A feature topology plot was developed for effective dimension reduction and visualization of multi-omics data. The approach is free of model assumption and robust to data noises or missingness. We developed a workflow to integrate homogeneous patient clustering from different omics data in an agglomerative manner and then visualized heterogeneous clustering of pairwise omics sources. We applied the framework to two batches of lung samples obtained from patients diagnosed with chronic obstructive lung disease (COPD) or interstitial lung disease (ILD) with well-characterized clinical (phenomic) data, mRNA and microRNA expression profiles. Application of iPF to the first training batch identified clusters of patients consisting of homogenous disease phenotypes as well as clusters with intermediate disease characteristics. Analysis of the second batch revealed a similar data structure, confirming the presence of intermediate clusters. Genes in the intermediate clusters were enriched with inflammatory and immune functional annotations, suggesting that they represent mechanistically distinct disease subphenotypes that may response to immunomodulatory therapies. The iPF software package and all source codes are publicly available.ConclusionsIdentification of subclusters with distinct clinical and biomolecular characteristics suggests that integration of phenomic and other omics information could lead to identification of novel mechanism-based disease sub-phenotypes.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-2170-4) contains supplementary material, which is available to authorized users.

Highlights

  • The increased multi-omics information on carefully phenotyped patients in studies of complex diseases requires novel methods for data integration

  • Chronic obstructive lung disease (COPD) is classified by the Global Initiative for Chronic Obstructive Lung Disease criteria in four major categories based on symptoms, airflow obstruction, and exacerbation history [1]

  • A distance matrix between any two features within and across omics data sets is defined (Fig. 2b); (3) Dimension reduction: Multidimensional scaling (MDS) is applied to map all features to a twodimensional Euclidean space for dimension reduction (Fig. 2c); (4) Feature smoothing: Feature intensities are smoothed in the reduced 2D space for each patient (Fig. 2d); (5) Clustering for subtype discovery and visualization: Unsupervised clustering analysis is performed to identify potential disease subtypes, and feature intensities within each cluster are averaged to generate representative plots for each cluster (Fig. 2e)

Read more

Summary

Introduction

The increased multi-omics information on carefully phenotyped patients in studies of complex diseases requires novel methods for data integration. COPD is a lung disease caused by the repeated exposure to a noxious agent resulting in irreversible airflow limitation. The term Interstitial Lung Disease designates a loosely defined group of patients characterized by changes in the interstitium of the lung, causing pulmonary restriction and impaired gas exchange. This group includes: Idiopathic Pulmonary Fibrosis (IPF), Non Specific Interstitial Pneumonia (NSIP), Hypersensitivity Pneumonitis (HP), Cryptogenic Organizing Pneumonia (COP), Respiratory Bronchiolotisassociated Interstitial Lung Disease (RB-ILD), Collagen Vascular Disease—associated Interstitial Lung Disease (CVD-ILD), Desquamative Interstitial Pneumonia (DIP) and Acute Interstitial Pneumonia (AIP), among others

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.