Multi-omics integration for neuroblastoma clinical endpoint prediction

Margherita Francescatto,Giuseppe Jurman,Setareh Rezvan Dezfooli,Marco Chierici,Alessandro Zandonà,Cesare Furlanello

doi:10.1186/s13062-018-0207-8

Abstract

BackgroundHigh-throughput methodologies such as microarrays and next-generation sequencing are routinely used in cancer research, generating complex data at different omics layers. The effective integration of omics data could provide a broader insight into the mechanisms of cancer biology, helping researchers and clinicians to develop personalized therapies.ResultsIn the context of CAMDA 2017 Neuroblastoma Data Integration challenge, we explore the use of Integrative Network Fusion (INF), a bioinformatics framework combining a similarity network fusion with machine learning for the integration of multiple omics data. We apply the INF framework for the prediction of neuroblastoma patient outcome, integrating RNA-Seq, microarray and array comparative genomic hybridization data. We additionally explore the use of autoencoders as a method to integrate microarray expression and copy number data.ConclusionsThe INF method is effective for the integration of multiple data sources providing compact feature signatures for patient classification with performances comparable to other methods. Latent space representation of the integrated data provided by the autoencoder approach gives promising results, both by improving classification on survival endpoints and by providing means to discover two groups of patients characterized by distinct overall survival (OS) curves.ReviewersThis article was reviewed by Djork-Arné Clevert and Tieliu Shi.

Highlights

High-throughput methodologies such as microarrays and next-generation sequencing are routinely used in cancer research, generating complex data at different omics layers
Integration of multiple data sources marginally improves endpoint prediction To evaluate the overall effect of data integration with respect to classification using the single datasets independently, we introduced the concept of Matthews Correlation Coefficient (MCC)
As novel method for the integration of multiple omics data, the Integrative Network Fusion (INF) method is applied to the three datasets proposed for the CAMDA 2017 Neuroblastoma Data Integration challenge

Summary

Introduction

High-throughput methodologies such as microarrays and next-generation sequencing are routinely used in cancer research, generating complex data at different omics layers. Neuroblastoma is a rare disease typically manifesting in early infancy with an estimated 700 new cases diagnosed in the U.S each year [1] It is characterized by a very heterogeneous clinical course, with extreme cases presenting spontaneous regression opposed by patients relapsing and eventually dying despite prompt therapy [2]. Because of this heterogeneity, the ability to accurately predict the most likely disease outcome at the time of diagnosis is of extreme importance, especially given that accurate risk estimation allows delivering an appropriate targeted therapy [3]. Comprehensive integrative approaches effective across multiple clinical outcomes are still limited [5]

Methods

Results

Discussion

Conclusion