IntegRATE: a desirability-based data integration framework for the prioritization of candidate genes across heterogeneous omics and its application to preterm birth

Haley R Eidem,Jacob L Steenwyk,John A Capra,Antonis Rokas,Patrick Abbot,Jennifer H Wisecaver

doi:10.1186/s12920-018-0426-y

Haley R Eidem, Jacob L Steenwyk + Show 4 more

Open Access

https://doi.org/10.1186/s12920-018-0426-y

Copy DOI

Abstract

BackgroundThe integration of high-quality, genome-wide analyses offers a robust approach to elucidating genetic factors involved in complex human diseases. Even though several methods exist to integrate heterogeneous omics data, most biologists still manually select candidate genes by examining the intersection of lists of candidates stemming from analyses of different types of omics data that have been generated by imposing hard (strict) thresholds on quantitative variables, such as P-values and fold changes, increasing the chance of missing potentially important candidates.MethodsTo better facilitate the unbiased integration of heterogeneous omics data collected from diverse platforms and samples, we propose a desirability function framework for identifying candidate genes with strong evidence across data types as targets for follow-up functional analysis. Our approach is targeted towards disease systems with sparse, heterogeneous omics data, so we tested it on one such pathology: spontaneous preterm birth (sPTB).ResultsWe developed the software integRATE, which uses desirability functions to rank genes both within and across studies, identifying well-supported candidate genes according to the cumulative weight of biological evidence rather than based on imposition of hard thresholds of key variables. Integrating 10 sPTB omics studies identified both genes in pathways previously suspected to be involved in sPTB as well as novel genes never before linked to this syndrome. integRATE is available as an R package on GitHub (https://github.com/haleyeidem/integRATE).ConclusionsDesirability-based data integration is a solution most applicable in biological research areas where omics data is especially heterogeneous and sparse, allowing for the prioritization of candidate genes that can be used to inform more targeted downstream functional analyses.

Highlights

The integration of high-quality, genome-wide analyses offers a robust approach to elucidating genetic factors involved in complex human diseases
We focus on using desirability functions to integrate heterogeneous omics data corresponding to complex human diseases, integRATE can be applied to data sets from any phenotype, species, and data type
In total, our spontaneous preterm birth (sPTB) analyses integrated gene-based results from 10 omics studies (1 genomics, 4 transcriptomics, 4 epigenomics, and 1 proteomics; Table 1) and included data sets ranging from 422 genes [35] to 20,841 genes [42]

Summary

Introduction

The integration of high-quality, genome-wide analyses offers a robust approach to elucidating genetic factors involved in complex human diseases. One integrative study design is to obtain diverse types of omics data from the same tissue samples or patient cohorts. A single type of omics data can be collected from a variety of tissue samples or patient cohorts, facilitating their horizontal integration across many samples, which can substantially increase the experiment’s power (Fig. 1a, top right). In both vertical and horizontal integration study designs, the availability of diverse types of omics data from the same samples enables the use of a variety of statistical integration approaches (Fig. 1a, bottom) [8]. Multi-staged integration uses multiple steps to first identify associations between different data types and identify associations between data types and the phenotype of interest [9], whereas meta-dimensional integration combines data simultaneously based on concatenation, transformation, or model building [10]

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Medical Genomics	Publication Date: Nov 19, 2018
Citations: 4	License type: open-access

R Discovery Prime

R Discovery Prime

IntegRATE: a desirability-based data integration framework for the prioritization of candidate genes across heterogeneous omics and its application to preterm birth

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Genomics

Lead the way for us

Similar Papers

Multi-omics facilitated variable selection in Cox-regression model for cancer prognosis prediction.
Cong Liu ... Georgi Z Genchev
Methods | VOL. 124
Cong Liu, et. al.Cong Liu ... Georgi Z Genchev
13 Jun 2017
Methods | VOL. 124

AFEI: adaptive optimized vertical federated learning for heterogeneous multi-omics data integration.
Qingyong Wang ... Hua Chai
Briefings in bioinformatics | VOL. 24
Qingyong Wang, et. al.Qingyong Wang ... Hua Chai
26 Jul 2023
Briefings in bioinformatics | VOL. 24

Adaptive Sparse Multi-Block PLS Discriminant Analysis: An Integrative Method for Identifying Key Biomarkers from Multi-Omics Data.
Runzhi Zhang ... Susmita Datta
Genes | VOL. 14
Runzhi Zhang, et. al.Runzhi Zhang ... Susmita Datta
23 Apr 2023
Genes | VOL. 14

Chapter 16 - Omics Data Integration in Systems Biology: Methods and Applications
Ana Conesa ...
Comprehensive Analytical Chemistry | VOL. -
Ana Conesa, et. al.Ana Conesa ...
01 Jan 2014
Comprehensive Analytical Chemistry | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

IntegRATE: a desirability-based data integration framework for the prioritization of candidate genes across heterogeneous omics and its application to preterm birth

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Genomics