Abstract

BackgroundIntegrative analysis of multi-omics data is becoming increasingly important to unravel functional mechanisms of complex diseases. However, the currently available multi-omics datasets inevitably suffer from missing values due to technical limitations and various constrains in experiments. These missing values severely hinder integrative analysis of multi-omics data. Current imputation methods mainly focus on using single omics data while ignoring biological interconnections and information imbedded in multi-omics data sets.ResultsIn this study, a novel multi-omics imputation method was proposed to integrate multiple correlated omics datasets for improving the imputation accuracy. Our method was designed to: 1) combine the estimates of missing value from individual omics data itself as well as from other omics, and 2) simultaneously impute multiple missing omics datasets by an iterative algorithm. We compared our method with five imputation methods using single omics data at different noise levels, sample sizes and data missing rates. The results demonstrated the advantage and efficiency of our method, consistently in terms of the imputation error and the recovery of mRNA-miRNA network structure.ConclusionsWe concluded that our proposed imputation method can utilize more biological information to minimize the imputation error and thus can improve the performance of downstream analysis such as genetic regulatory network construction.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1122-6) contains supplementary material, which is available to authorized users.

Highlights

  • Integrative analysis of multi-omics data is becoming increasingly important to unravel functional mechanisms of complex diseases

  • Simulation data were derived from the cancer genomic atlas (TCGA; http://cancergenome.nih.gov/) database on Glioma cancer study containing 50 subjects with 5939 mRNAs, 104 microRNAs and 5013 DNA methylation sites

  • Experimental results confirmed the advantage of our multi-omics based method over five singleomics imputation methods (KNNimpute, Bayesian principle component analysis (BPCA), SVDimpute, local least square imputation (LLS) and iterative local least square (iLLS)) consistently in all three different scenarios in terms of lower value of normalized root mean squared error (NRMSE)

Read more

Summary

Introduction

Integrative analysis of multi-omics data is becoming increasingly important to unravel functional mechanisms of complex diseases. The currently available multi-omics datasets inevitably suffer from missing values due to technical limitations and various constrains in experiments. These missing values severely hinder integrative analysis of multi-omics data. Due to technical limitations of these high throughput technologies and experimental designs, the presence of missing values remains an inevitable and prevalent problem in large-scale profiling experiments [1]. A number of studies have indicated that missing values in large-scale omics data can drastically hinder downstream analyses, such as unsupervised clustering of genes [12], detection of differentially expressed genes [13], supervised classification of clinical samples [14], construction of gene regulatory networks [15], genome wide association studies [16] and detection of differentially methylated regions [17]. It is highly demanded to impute the missing values before performing integrative analysis of multi-omics data

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call