Integrative Network Fusion: A Multi-Omics Approach in Molecular Profiling.

Marco Chierici,Cesare Furlanello,Alessandro Zandonà,Margherita Francescatto,Lucia Trastulla,Claudio Agostinelli,Nicole Bussola,Alessia Marcolini,Giuseppe Jurman

doi:10.3389/fonc.2020.01065

Abstract

Recent technological advances and international efforts, such as The Cancer Genome Atlas (TCGA), have made available several pan-cancer datasets encompassing multiple omics layers with detailed clinical information in large collection of samples. The need has thus arisen for the development of computational methods aimed at improving cancer subtyping and biomarker identification from multi-modal data. Here we apply the Integrative Network Fusion (INF) pipeline, which combines multiple omics layers exploiting Similarity Network Fusion (SNF) within a machine learning predictive framework. INF includes a feature ranking scheme (rSNF) on SNF-integrated features, used by a classifier over juxtaposed multi-omics features (juXT). In particular, we show instances of INF implementing Random Forest (RF) and linear Support Vector Machine (LSVM) as the classifier, and two baseline RF and LSVM models are also trained on juXT. A compact RF model, called rSNFi, trained on the intersection of top-ranked biomarkers from the two approaches juXT and rSNF is finally derived. All the classifiers are run in a 10x5-fold cross-validation schema to warrant reproducibility, following the guidelines for an unbiased Data Analysis Plan by the US FDA-led initiatives MAQC/SEQC. INF is demonstrated on four classification tasks on three multi-modal TCGA oncogenomics datasets. Gene expression, protein expression and copy number variants are used to predict estrogen receptor status (BRCA-ER, N = 381) and breast invasive carcinoma subtypes (BRCA-subtypes, N = 305), while gene expression, miRNA expression and methylation data is used as predictor layers for acute myeloid leukemia and renal clear cell carcinoma survival (AML-OS, N = 157; KIRC-OS, N = 181). In test, INF achieved similar Matthews Correlation Coefficient (MCC) values and 97% to 83% smaller feature sizes (FS), compared with juXT for BRCA-ER (MCC: 0.83 vs. 0.80; FS: 56 vs. 1801) and BRCA-subtypes (0.84 vs. 0.80; 302 vs. 1801), improving KIRC-OS performance (0.38 vs. 0.31; 111 vs. 2319). INF predictions are generally more accurate in test than one-dimensional omics models, with smaller signatures too, where transcriptomics consistently play the leading role. Overall, the INF framework effectively integrates multiple data levels in oncogenomics classification tasks, improving over the performance of single layers alone and naive juxtaposition, and provides compact signature sizes1.

Highlights

The challenge of integrating multi-omics data is as old as bioinformatics itself [1, 2], but, despite the wide literature, it remains an open issue nowadays, even worth being funded by major institutions2.This study introduces Integrative Network Fusion (INF), a reproducible network-based framework for high-throughput omics data integration that leverages machine learning models to extract multi-omics predictive biomarkers
Experiments are run on samples with randomly shuffled labels as a sanity check vs. overfitting effects and, INF robustness is verified by testing on different train/test splits
The INF workflow was run on all tasks considering 3-layer integration and all 2-layer combinations; the Data Analysis Plan (DAP) was run separately on all single-layer datasets in order to obtain a baseline

Summary

Introduction

This study introduces Integrative Network Fusion (INF), a reproducible network-based framework for high-throughput omics data integration that leverages machine learning models to extract multi-omics predictive biomarkers. Conceptualized and tested on multi-omics metagenomics data in an early preliminary version [3, 4], INF combines the signatures retrieved from both the early-integration approach of variable juxtaposition (juXT) and an intermediate-integration approach [SNF, [5]], to find the optimal set of predictive features. A feature ranking scheme (rSNF) is computed on SNF-integrated features and a RF model (rSNFi) is trained on the intersection of two sets of top-ranked features from juXT and rSNF, obtaining an approach that effectively integrates multiple omics layers and provides compact predictive signatures. Experiments are run on samples with randomly shuffled labels as a sanity check vs. overfitting effects and, INF robustness is verified by testing on different train/test splits

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Oncology	Publication Date: Jun 30, 2020
Citations: 36	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Integrative Network Fusion: A Multi-Omics Approach in Molecular Profiling.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Oncology

Lead the way for us

Similar Papers

Gene Expression Machine Learning Models Classify Pediatric AML Subtypes with High Performance
Krish Shah ... Guangchun Song
Blood | VOL. 142
Krish Shah, et. al.Krish Shah ... Guangchun Song
28 Nov 2023
Blood | VOL. 142

Comparative Analysis of the Ability of Machine Learning Models in Predicting In-hospital Postoperative Outcomes After Total Hip Arthroplasty.
Mouhanad M El-Othmani ... Roshan P Shah
Journal of the American Academy of Orthopaedic Surgeons | VOL. 30
Mouhanad M El-Othmani, et. al.Mouhanad M El-Othmani ... Roshan P Shah
09 Aug 2022
Journal of the American Academy of Orthopaedic Surgeons | VOL. 30

Monitoring Variables Influence on Random Forest Models to Forecast Injuries in Short-Track Speed Skating.
Jérémy Briand ... Sylvain Gaudet
Frontiers in sports and active living | VOL. 4
Jérémy Briand, et. al.Jérémy Briand ... Sylvain Gaudet
14 Jul 2022
Frontiers in sports and active living | VOL. 4

GSEA–SDBE: A gene selection method for breast cancer classification based on GSEA and analyzing differences in performance metrics
Hu Ai ... Nguyen Quoc Khanh Le
-
Hu Ai, et. al.Hu Ai ... Nguyen Quoc Khanh Le
26 Apr 2022
26 Apr 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Integrative Network Fusion: A Multi-Omics Approach in Molecular Profiling.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Oncology