TiMEG: an integrative statistical method for partially missing multi-omics data

Sarmistha Das,Indranil Mukhopadhyay

doi:10.1038/s41598-021-03034-z

Sarmistha Das, Indranil Mukhopadhyay

Open Access

https://doi.org/10.1038/s41598-021-03034-z

Copy DOI

Journal: Scientific Reports	Publication Date: Dec 1, 2021
Citations: 11	License type: open-access

Affiliation: Indian Statistical Institute

Abstract

Multi-omics data integration is widely used to understand the genetic architecture of disease. In multi-omics association analysis, data collected on multiple omics for the same set of individuals are immensely important for biomarker identification. But when the sample size of such data is limited, the presence of partially missing individual-level observations poses a major challenge in data integration. More often, genotype data are available for all individuals under study but gene expression and/or methylation information are missing for different subsets of those individuals. Here, we develop a statistical model TiMEG, for the identification of disease-associated biomarkers in a case–control paradigm by integrating the above-mentioned data types, especially, in presence of missing omics data. Based on a likelihood approach, TiMEG exploits the inter-relationship among multiple omics data to capture weaker signals, that remain unidentified in single-omic analysis or common imputation-based methods. Its application on a real tuberous sclerosis dataset identified functionally relevant genes in the disease pathway.

Highlights

Multi-omics data integration is widely used to understand the genetic architecture of disease
In presence of limited sample size, missing individual-level information on multiple assays poses a great loss of information
Imputation might lead to bias in such a small sample size as the percentage of missing data is large

Summary

Introduction

Multi-omics data integration is widely used to understand the genetic architecture of disease. We develop a statistical model TiMEG, for the identification of disease-associated biomarkers in a case–control paradigm by integrating the above-mentioned data types, especially, in presence of missing omics data. To understand the genetic architecture of disease, genome-wide association s tudies and several other studies based on single-omic data such as gene expression or DNA methylation have catalogued many disease-associated loci. Integration methods combine multiple omics data from large consortiums of different cohorts15,24 Such methods are prone to spurious prioritisation of associated genes owing to substantial cross-cell-type variation. Such methods are prone to spurious prioritisation of associated genes owing to substantial cross-cell-type variation25 For these reasons and to reduce the stratification bias due to population diversity, increasing attempts are being made to create large scale multiomics datasets recently by combining multiple assays from the same set of s amples. Gene expression and/or methylation assays are rarely repeated for generating the missing data due to various reasons such as the huge cost

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

TiMEG: an integrative statistical method for partially missing multi-omics data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

A Multi-omics Data Analysis Workflow Packaged as a FAIR Digital Object
Anna Niehues ... Alida Kindt
Research Ideas and Outcomes | VOL. 8
Anna Niehues, et. al.Anna Niehues ... Alida Kindt
25 Aug 2022
Research Ideas and Outcomes | VOL. 8

Artificial Intelligence-based Multiomics Integration Model for Cancer Subtyping
Aadil Rashid Bhat ... Rana Hashmy
-
Aadil Rashid Bhat, et. al.Aadil Rashid Bhat ... Rana Hashmy
23 Mar 2022
23 Mar 2022

Prospects and challenges of multi-omics data integration in toxicology
Sebastian Canzler ... Hennicke Kamp
Archives of Toxicology | VOL. 94
Sebastian Canzler, et. al.Sebastian Canzler ... Hennicke Kamp
01 Feb 2020
Archives of Toxicology | VOL. 94

Machine learning for precision medicine forecasts and challenges when incorporating non omics and omics data
J Susymary ... P Deepalakshmi
Intelligent Decision Technologies | VOL. 15
J Susymary, et. al.J Susymary ... P Deepalakshmi
24 Mar 2021
Intelligent Decision Technologies | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

TiMEG: an integrative statistical method for partially missing multi-omics data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports