Abstract
BackgroundAn integrative multi-omics analysis approach that combines multiple types of omics data including genomics, epigenomics, transcriptomics, proteomics, metabolomics, and microbiomics has become increasing popular for understanding the pathophysiology of complex diseases. Although many multi-omics analysis methods have been developed for complex disease studies, only a few simulation tools that simulate multiple types of omics data and model their relationships with disease status are available, and these tools have their limitations in simulating the multi-omics data.ResultsWe developed the multi-omics data simulator OmicsSIMLA, which simulates genomics (i.e., single-nucleotide polymorphisms [SNPs] and copy number variations), epigenomics (i.e., bisulphite sequencing), transcriptomics (i.e., RNA sequencing), and proteomics (i.e., normalized reverse phase protein array) data at the whole-genome level. Furthermore, the relationships between different types of omics data, such as methylation quantitative trait loci (SNPs influencing methylation), expression quantitative trait loci (SNPs influencing gene expression), and expression quantitative trait methylations (methylations influencing gene expression), were modeled. More importantly, the relationships between these multi-omics data and the disease status were modeled as well. We used OmicsSIMLA to simulate a multi-omics dataset for breast cancer under a hypothetical disease model and used the data to compare the performance among existing multi-omics analysis methods in terms of disease classification accuracy and runtime. We also used OmicsSIMLA to simulate a multi-omics dataset with a scale similar to an ovarian cancer multi-omics dataset. The neural network–based multi-omics analysis method ATHENA was applied to both the real and simulated data and the results were compared. Our results demonstrated that complex disease mechanisms can be simulated by OmicsSIMLA, and ATHENA showed the highest prediction accuracy when the effects of multi-omics features (e.g., SNPs, copy number variations, and gene expression levels) on the disease were strong. Furthermore, similar results can be obtained from ATHENA when analyzing the simulated and real ovarian multi-omics data.ConclusionsOmicsSIMLA will be useful to evaluate the performace of different multi-omics analysis methods. Sample sizes and power can also be calculated by OmicsSIMLA when planning a new multi-omics disease study.
Highlights
Complex diseases such as hypertension, type 2 diabetes, and autism are caused by multiple genetic and environmental factors (Timpson et al 2018)
We demonstrated the usefulness of OmicsSIMLA by simulating a multi-omics dataset for breast cancer under a hypothetical disease model, and compared the performance among existing multi-omics analysis tools based on the data
The epigenomics data are the methylated and total read counts at CpGs based on bisulphite sequencing, simulated using the pWGBSSimla algorithm incorporating methylation profiles for 29 human cell and tissue types (Chung and Kang 2018)
Summary
Complex diseases such as hypertension, type 2 diabetes, and autism are caused by multiple genetic and environmental factors (Timpson et al 2018). Genome-wide association studies have identified many genetic variants (i.e., SNPs) associated with the complex diseases. It remains difficult to understand the roles of the associated SNPs in the molecular pathophysiology of the disease and how the SNPs interact with other SNPs in a biological network (Karczewski and Snyder 2018). As a single type of data generally cannot capture the complexity of molecular events causing the disease, an integrative approach to combining the multi-omics data would be ideal to help elucidate the pathophysiology of the disease (Karczewski and Snyder 2018)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.