Integrative Multi-Omics Approaches for Personalized Medicine and Health
Multi-omics data integration enhances personalized medicine by providing comprehensive insights into disease mechanisms through genomic, transcriptomic, proteomic, and metabolomic analyses, enabling patient-specific diagnostics and therapeutics; this approach improves treatment outcomes and aims to democratize precision healthcare.
Introduction: Multi-omics data integration has transformed personalized medicine, providing a comprehensive understanding of disease mechanisms and informed precision therapeutic options. Multi-omics data generated for the same samples/patients can help in getting insights into the flow of biological information at several levels, thereby providing in-depth information regarding the molecular mechanisms underlying pathological conditions. Multi-omics integration plays a pivotal role in personalized medicine by providing comprehensive insights into the complex biological systems of individual patients. This review provides a comprehensive account of the current and future progress brought into multi-omics methodologies, promising to refine diagnostics and therapeutic strategy by integrating genomic, transcriptomic analyses, proteomics approaches and metabolome screens. Methods: A literature search was performed in PubMed using keywords like genomics, proteomics, transcriptomics, metabolomics, multi-omics, and precision medicine to identify published research articles. A thorough review of all results was then conducted, and their results and conclusions were compiled and summarized. Results: By analyzing various omics layers, such as genomics, transcriptomics, proteomics, and metabolomics, multi-omics approaches enable the identification of patient-specific molecular traits and the discovery of new clinical therapeutics for diseases. Integration of various data types augments diagnostics, optimizes therapeutic regimens and supports personalized medicine according to an individual patient profile. Conclusion: Integration of multi-omics data and its applications in various fields, such as cancer research, helps in optimizing patient-specific treatment and improvement of patient health. With time, as these technologies reach more people, they stand to democratize precision medicine and hopefully bridge health disparities. In conclusion, the present review highlights multiomics data integration as a transformative step towards personalized medicine and ultimately changing patient care from empirical-based to precision or individualized.
- Research Article
6
- 10.1007/s00204-024-03876-2
- Oct 23, 2024
- Archives of Toxicology
Multi-omics data integration has been repeatedly discussed as the way forward to more comprehensively cover the molecular responses of cells or organisms to chemical exposure in systems toxicology and regulatory risk assessment. In Canzler et al. (Arch Toxicol 94(2):371–388. https://doi.org/10.1007/s00204-020-02656-y), we reviewed the state of the art in applying multi-omics approaches in toxicological research and chemical risk assessment. We developed best practices for the experimental design of multi-omics studies, omics data acquisition, and subsequent omics data integration. We found that multi-omics data sets for toxicological research questions were generally rare, with no data sets comprising more than two omics layers adhering to these best practices. Due to these limitations, we could not fully assess the benefits of different data integration approaches or quantitatively evaluate the contribution of various omics layers for toxicological research questions. Here, we report on a multi-omics study on thyroid toxicity that we conducted in compliance with these best practices. We induced direct and indirect thyroid toxicity through Propylthiouracil (PTU) and Phenytoin, respectively, in a 28-day plus 14-day recovery oral rat toxicity study. We collected clinical and histopathological data and six omics layers, including the long and short transcriptome, proteome, phosphoproteome, and metabolome from plasma, thyroid, and liver. We demonstrate that the multi-omics approach is superior to single-omics in detecting responses at the regulatory pathway level. We also show how combining omics data with clinical and histopathological parameters facilitates the interpretation of the data. Furthermore, we illustrate how multi-omics integration can hint at the involvement of non-coding RNAs in post-transcriptional regulation. Also, we show that multi-omics facilitates grouping, and we assess how much information individual and combinations of omics layers contribute to this approach.
- Front Matter
1
- 10.3389/fgene.2024.1487893
- Sep 25, 2024
- Frontiers in genetics
Three years ago, in 2021, our first collection on Application of Novel Statistical and Machine-learning Methods to High-dimensional Clinical Cancer and (Multi-) Omics Data has been a highlight for the readership in Frontiers, with over 52K views and 13K downloads. It has contributed greatly to the field by highlighting cutting edge research in the area of statistical genetics and methodology. Building on the success of the first volume, we bring another collection of insightful and thought-provoking research on this research topic by presenting four articles.In this second volume, we continue our previous focus on the development and application of novel statistical and machine-learning methods for high-dimensional clinical and (multi-)omics data in cancerrelated research. With the development of artificial intelligence (AI), especially the deep learning (DL), three out of four articles in Volume II investigated methods in multi-omics data integration using DL, while the fourth article investigated a new method for sequencing data processing.With the rapid evolvement of DL, significant progress has been made in applying DL based method to multi-omics integration. In a review article, Wekesa et al. comprehensively discussed the recent trends in using DL techniques for multi-omics data analysis in disease diagnosis, prognosis, and treatment. They focused particularly on multi-omics datasets that involve non-coding RNAs, such as miRNAs and long non-coding RNAs (lncRNAs), which played essential roles in cancer development and research. Several novel DL methods for integration and interpretation were highlighted, including contrastive learning, DeepLIFT, factorization machine deep learning (FMDNN), and graph neural networks (GNNs). Further, they assessed studies combing DL methods and emerging technologies, such as blockchain and internet of things (IoTs), in computational biology. Cases studies in breast and brain cancer detection demonstrated how integrating cutting-edge technologies and DL methods could advance the cancer research and clinical applications. By reading this review, it becomes clear that the development of innovative methods, algorithms, and analytical frameworks that integrate clinical, multi-omics, and imaging data for cancer research is particularly exciting. Moreover, they discussed potential challenges and future prospects, providing valuable insights into the field's future.In addition to the data types discussed in Volume I, we aim to showcase more studies that analyze imaging data, particularly due to the extensive use of imaging technique in cancer diagnosis, treatment, and research. Zhao et al. developed new models that can integrate radiomics data and whole genome sequencing data. Although their prediction outcome focused on proximal femoral strength related to hip fracture, their models can be straightforwardly adapted for imaging analysis in cancer research. Specifically, they extended the DL method of variational autoencoder from a single-view input into a multi-view input approach. Compared to other high-dimension multi-view information integration algorithms, the proposed model demonstrated superior performance in terms of root mean squared error (RMSE) and the coefficient of determination (R-squared). The significance of the analyzed features/variables was further interpreted through the leave-one-out technique.Another compelling study in this collection explores a linear dimensionality reduction method using DL. Dimension reduction is a critical step in the analysis of high-dimensional genetic and imaging data, as it helps to extract representative features for visualization or downstream analysis, such as prediction or classification. Li et al. introduced neural principal component analysis (nPCA), which enhances the widely-used original principal component analysis (PCA) by retaining the linear information of raw data. This new method was successfully applied to high-dimensional single-cell RNA sequencing datasets of pancreas. The nPCA method holds promise as an alternative dimension reduction technique for cancer investigators.The last article in this collection addressed the issue of sequencing data compression. With the reduction in sequencing cost, multi-omics data are increasingly generated through sequencing technologies. While bioinformatician and biostatistician often worked with processed sequencing file, such as bam and VCF files, the large raw sequencing files still need to be stored for backup, sharing, and legal requirements. Chen et al. presented a two-step framework for sequencing data compression, achieving up to a four-fold compression ratio compared to Gzip, all within an acceptable timeframe. Their tool, repaq, is freely available on GitHub, providing a valuable solution for managing large-scale sequencing data efficiently.In summary, the Volume II collection of original research, review, and technology papers highlights the latest advancements in the integrative analysis of clinical, imaging, and (multi-)omics cancer data, along with statistical and computational methods for high-dimensional data analysis. Combined with Volume I, we hope these collections will contribute to the integrative cancer research and inspire further methodology development in related fields.
- Research Article
10
- 10.1109/access.2019.2955958
- Jan 1, 2019
- IEEE Access
Rapid advances in high-throughput sequencing technology have led to the generation of a large number of multi-omics biological datasets. Integrating data from different omics provides an unprecedented opportunity to gain insight into disease mechanisms from different perspectives. However, integrative analysis and predictive modeling from multi-omics data are facing three major challenges: i) heavy noises; ii) the high dimensions compared to the small samples; iii) data heterogeneity. Current multi-omics data integration approaches have some limitations and are susceptible to heavy noise. In this paper, we present MSPL, a robust supervised multi-omics data integration method that simultaneously identifies significant multi-omics signatures during the integration process and predicts the cancer subtypes. The proposed method not only inherits the generalization performance of self-paced learning but also leverages the properties of multi-omics data containing correlated information to interactively recommend high-confidence samples for model training. We demonstrate the capabilities of MSPL using simulated data and five multi-omics biological datasets, integrating up three omics to identify potential biological signatures, and evaluating the performance compared to state-of-the-art methods in binary and multi-class classification problems. Our proposed model makes multi-omics data integration more systematic and expands its range of applications.
- Research Article
12
- 10.3389/fgene.2024.1488683
- Dec 10, 2024
- Frontiers in Genetics
Multi-omics data integration has become increasingly crucial for a deeper understanding of the complexity of biological systems. However, effectively integrating and analyzing multi-omics data remains challenging due to their heterogeneity and high dimensionality. Existing methods often struggle with noise, redundant features, and the complex interactions between different omics layers, leading to suboptimal performance. Additionally, they face difficulties in adequately capturing intra-omics interactions due to simplistic concatenation techiniques, and they risk losing critical inter-omics interaction information when using hierarchical attention layers. To address these challenges, we propose a novel Denoised Multi-Omics Integration approach that leverages the Transformer multi-head self-attention mechanism (DMOIT). DMOIT consists of three key modules: a generative adversarial imputation network for handling missing values, a sampling-based robust feature selection module to reduce noise and redundant features, and a multi-head self-attention (MHSA) based feature extractor with a noval architecture that enchance the intra-omics interaction capture. We validated model porformance using cancer datasets from the Cancer Genome Atlas (TCGA), conducting two tasks: survival time classification across different cancer types and estrogen receptor status classification for breast cancer. Our results show that DMOIT outperforms traditional machine learning methods and the state-of-the-art integration method MoGCN in terms of accuracy and weighted F1 score. Furthermore, we compared DMOIT with various alternative MHSA-based architectures to further validate our approach. Our results show that DMOIT consistently outperforms these models across various cancer types and different omics combinations. The strong performance and robustness of DMOIT demonstrate its potential as a valuable tool for integrating multi-omics data across various applications.
- Research Article
82
- 10.1038/s41390-022-02181-x
- Jul 8, 2022
- Pediatric Research
Technological advances in omics evaluation, bioinformatics, and artificial intelligence have made us rethink ways to improve patient outcomes. Collective quantification and characterization of biological data including genomics, epigenomics, metabolomics, and proteomics is now feasible at low cost with rapid turnover. Significant advances in the integration methods of these multiomics data sets by machine learning promise us a holistic view of disease pathogenesis and yield biomarkers for disease diagnosis and prognosis. Using machine learning tools and algorithms, it is possible to integrate multiomics data with clinical information to develop predictive models that identify risk before the condition is clinically apparent, thus facilitating early interventions to improve the health trajectories of the patients. In this review, we intend to update the readers on the recent developments related to the use of artificial intelligence in integrating multiomic and clinical data sets in the field of perinatology, focusing on neonatal intensive care and the opportunities for precision medicine. We intend to briefly discuss the potential negative societal and ethical consequences of using artificial intelligence in healthcare. We are poised for a new era in medicine where computational analysis of biological and clinical data sets will make precision medicine a reality. IMPACT: Biotechnological advances have made multiomic evaluations feasible and integration of multiomics data may provide a holistic view of disease pathophysiology. Artificial Intelligence and machine learning tools are being increasingly used in healthcare for diagnosis, prognostication, and outcome predictions. Leveraging artificial intelligence and machine learning tools for integration of multiomics and clinical data will pave the way for precision medicine in perinatology.
- Research Article
1
- 10.3390/biology14121764
- Dec 10, 2025
- Biology
Integration of multi-omics data provides a comprehensive perspective on complex biological systems, facilitating advances in disease classification and biomarker discovery. However, the heterogeneity and high dimensionality of omics data present significant analytical challenges. To achieve effective and interpretable multi-omics integration, we propose a novel deep learning framework named MOGOLA(Multi-Omics integration by Gating and Omics-Linked Attention). MOGOLA consists of three core components: (1) A hybrid graph learning module that integrates Graph Convolutional Networks and Graph Attention Networks for intra-omics feature extraction. (2) A gating and confidence mechanism that adaptively weighs feature importance across different omics types. (3) A cross-omics attention-based fusion module that captures inter-omics relationships. Comprehensive evaluations on four benchmark datasets (BRCA, KIPAN, ROSMAP, and LGG) demonstrate that MOGOLA consistently outperforms eleven state-of-the-art approaches. Ablation studies further validate the contribution of each module, while biomarkers identification highlight the framework's clinical potential. These results show that MOGOLA is a robust and interpretable approach for multi-omics data integration and a contribution to advances in computational biology and precision medicine.
- Research Article
4
- 10.1158/1538-7445.am2024-908
- Mar 22, 2024
- Cancer Research
Multi-omics research has enhanced our understanding of cancer heterogeneity and progression. Investigating molecular data through multi-omics approaches is crucial for unraveling the complex biological mechanisms underlying cancer, thereby enabling more effective diagnosis, treatment, and prevention strategies. However, predicting patient outcomes through integration of all available multi-omics data is still an under-study research direction. Here, we present SeNMo (Self-normalizing Network for Multi-omics), a deep neural network that ensures the zero mean and unit variance of activations across network layers using the self-normalizing technique. Such normalizing techniques are critical in stable and robust learning of deep learning models. SeNMo is particularly efficient in handling multi-omics data characterized by high-width (many features) and low-length (fewer samples) attributes. We trained SeNMo for the task of overall survival of patients using pan-cancer multi-omics data involving 28 cancer sites from the Genomic Data Commons (GDC). The training multi-omics data includes gene expression, DNA methylation, miRNA expression, and protein expression modalities. We tested the model's performance on the Moffitt Cancer Center's internal data involving RNA expression and protein expression data. We evaluated the model’s performance in predicting patient’s overall survival using the concordance index (C-Index), which provides a robust measure of the model's predictive capability. SeNMo performed consistently well in the training regime, reflected by the validation C-Index≥0.6 on GDC's public data. In the testing regime on Moffitt's private data, SeNMo performed with a C-Index of 0.68. The model's performance increased when tested on low-dimensional data or when tested on single omic data such as RNA or protein expression data with a C-Index of 0.7. SeNMo proved to be a mini-foundation model for multi-omics oncology data because it demonstrated robust performance, adaptability across molecular data types, and universal approximator capabilities for the scale of molecular data it was trained on. SeNMo can be further scaled to any cancer site and molecular data type. It can also be fine-tuned for other downstream tasks such as treatment response prediction, risk stratification, patient subgroup identification, and others. Its ability to accurately predict patient outcomes and adapt to various downstream tasks indicates a new era in cancer research and treatment. For future research, SeNMo offers a powerful tool for uncovering deeper insights into the complex nature of cancer and sets a precedent for how artificial intelligence can be leveraged to handle the vast and intricate data in the biomedical field. We believe SeNMo and similar models are poised to transform the oncology landscape, offering hope for more effective, efficient, and patient-centric cancer care. Citation Format: Asim Waqas, Aakash Tripathi, Sabeen Ahmed, Ashwin Mukund, Paul Stewart, Mia Naeini, Hamza Farooq, Ghulam Rasool. SeNMo: A self-normalizing deep learning model for enhanced multi-omics data analysis in oncology [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 908.
- Research Article
3
- 10.1016/j.xops.2024.100629
- Oct 1, 2024
- Ophthalmology Science
Interplay between Lipids and Complement Proteins—How Multiomics Data Integration Can Help Unravel Age-related Macular Degeneration Pathophysiology: A Proof-of-concept Study
- Research Article
22
- 10.1016/j.compbiomed.2024.108058
- Jan 28, 2024
- Computers in Biology and Medicine
CLCLSA: Cross-omics linked embedding with contrastive learning and self attention for integration with incomplete multi-omics data
- Research Article
- 10.1007/978-3-032-18966-0_15
- Jan 1, 2026
- Advances in experimental medicine and biology
Human diseases are multi-factorial, affecting multiple aspects of a homeostatic system. Recent advances in high-throughput technology have allowed the generation of various omics datasets from large cohorts at affordable costs and hence made it possible to study the complex dynamical systems perturbed in human diseases. Studying the complex perturbed systems offers a mechanistic understanding to identify druggable targets and offers new avenues for individualised medical intervention. Mechanisms driving complex human diseases cannot be explored merely by single omics-focused studies. In addition, the heterogeneity among the human populations adds additional complexity and limits the possibility for inferring the regulatory mechanisms underlying these diseases. Examining the disease or phenotype of interest through the lens of multiple omics layers may allow the dissection of the perturbed biological processes associated with the disease. Studying a complex disease through multiple omics layers providing vast information is quite a challenging task and, therefore, requires statistical frameworks to achieve integrative multi-omics analysis. In this chapter, we first summarise key characteristics of each of the omics layers and the various considerations important for the implementation of statistical methods. We then shed light on the most common statistical methods used for multi-omics integration studies and highlight various published examples showing the use of these methods for addressing key biological questions. For this, we show integration examples focused on at least two prime omics layers. We next focus on methods and examples showing multi-omics integration to study dynamical systems in large cohort studies. Finally, we discuss some of the multi-omics approaches and examples from single-cell multi-omics datasets.
- Research Article
17
- 10.3389/fgene.2020.564792
- Nov 12, 2020
- Frontiers in Genetics
Pharmacogenomics is the study of how genes affect a person's response to drugs. Thus, understanding the effect of drug at the molecular level can be helpful in both drug discovery and personalized medicine. Over the years, transcriptome data upon drug treatment has been collected and several databases compiled before drug treatment cancer cell multi-omics data with drug sensitivity (IC50, AUC) or time-series transcriptomic data after drug treatment. However, analyzing transcriptome data upon drug treatment is challenging since more than 20,000 genes interact in complex ways. In addition, due to the difficulty of both time-series analysis and multi-omics integration, current methods can hardly perform analysis of databases with different data characteristics. One effective way is to interpret transcriptome data in terms of well-characterized biological pathways. Another way is to leverage state-of-the-art methods for multi-omics data integration. In this paper, we developed Drug Response analysis Integrating Multi-omics and time-series data (DRIM), an integrative multi-omics and time-series data analysis framework that identifies perturbed sub-pathways and regulation mechanisms upon drug treatment. The system takes drug name and cell line identification numbers or user's drug control/treat time-series gene expression data as input. Then, analysis of multi-omics data upon drug treatment is performed in two perspectives. For the multi-omics perspective analysis, IC50-related multi-omics potential mediator genes are determined by embedding multi-omics data to gene-centric vector space using a tensor decomposition method and an autoencoder deep learning model. Then, perturbed pathway analysis of potential mediator genes is performed. For the time-series perspective analysis, time-varying perturbed sub-pathways upon drug treatment are constructed. Additionally, a network involving transcription factors (TFs), multi-omics potential mediator genes, and perturbed sub-pathways is constructed, and paths to perturbed pathways from TFs are determined by an influence maximization method. To demonstrate the utility of our system, we provide analysis results of sub-pathway regulatory mechanisms in breast cancer cell lines of different drug sensitivity. DRIM is available at: http://biohealth.snu.ac.kr/software/DRIM/.
- Book Chapter
- 10.1007/978-3-319-94968-0_9
- Jan 1, 2018
The rapid accumulation of multi-omics cancer data has created the opportunity for biological discovery and biomedical applications. In this study, we propose an approach that integrates multi-omics data to identify dysregulated pathways driving cancer subtypes, which simultaneously considers DNA methylation, DNA copy number, somatic mutation and gene expression profiles. After applying it to Breast Invasive Carcinoma (BRCA) in TCGA, we identify distinct top 30 dysregulated pathways for each breast cancer subtypes. The result suggests that dysregulated pathways of different subtypes display common and specific patterns. Furthermore, 44 differentially expressed genes with corresponding genetic and epigenetic dysregulation are retrieved from the subtype-specific pathways. Literature validation and functional enrichment analysis indicate that these genes are function associated with BRCA. Our method provides a new insight for identifying the driver of cancer subtypes through multi-omics data integration.
- Research Article
4
- Apr 12, 2023
- ArXiv
Integration of heterogeneous and high-dimensional multi-omics data is becoming increasingly important in understanding genetic data. Each omics technique only provides a limited view of the underlying biological process and integrating heterogeneous omics layers simultaneously would lead to a more comprehensive and detailed understanding of diseases and phenotypes. However, one obstacle faced when performing multi-omics data integration is the existence of unpaired multi-omics data due to instrument sensitivity and cost. Studies may fail if certain aspects of the subjects are missing or incomplete. In this paper, we propose a deep learning method for multi-omics integration with incomplete data by Cross-omics Linked unified embedding with Contrastive Learning and Self Attention (CLCLSA). Utilizing complete multi-omics data as supervision, the model employs cross-omics autoencoders to learn the feature representation across different types of biological data. The multi-omics contrastive learning, which is used to maximize the mutual information between different types of omics, is employed before latent feature concatenation. In addition, the feature-level self-attention and omics-level self-attention are employed to dynamically identify the most informative features for multi-omics data integration. Extensive experiments were conducted on four public multi-omics datasets. The experimental results indicated that the proposed CLCLSA outperformed the state-of-the-art approaches for multi-omics data classification using incomplete multi-omics data.
- Research Article
4
- 10.21203/rs.3.rs-2768563/v1
- May 2, 2023
- Research Square
Integration of heterogeneous and high-dimensional multi-omics data is becoming increasingly important in understanding genetic data. Each omics technique only provides a limited view of the underlying biological process and integrating heterogeneous omics layers simultaneously would lead to a more comprehensive and detailed understanding of diseases and phenotypes. However, one obstacle faced when performing multi-omics data integration is the existence of unpaired multi-omics data due to instrument sensitivity and cost. Studies may fail if certain aspects of the subjects are missing or incomplete. In this paper, we propose a deep learning method for multi-omics integration with incomplete data by Cross-omics Linked unified embedding with Contrastive Learning and Self Attention (CLCLSA). Utilizing complete multi-omics data as supervision, the model employs cross-omics autoencoders to learn the feature representation across different types of biological data. The multi-omics contrastive learning, which is used to maximize the mutual information between different types of omics, is employed before latent feature concatenation. In addition, the feature-level self-attention and omics-level self-attention are employed to dynamically identify the most informative features for multi-omics data integration. Extensive experiments were conducted on four public multi-omics datasets. The experimental results indicated that the proposed CLCLSA outperformed the state-of-the-art approaches for multi-omics data classification using incomplete multiomics data.
- Research Article
42
- 10.1016/j.compbiomed.2023.106639
- Feb 11, 2023
- Computers in Biology and Medicine
Integrated analysis of multi-omics data for the discovery of biomarkers and therapeutic targets for colorectal cancer