Abstract

BackgroundBreast cancer is the most prevalent and among the most deadly cancers in females. Patients with breast cancer have highly variable survival lengths, indicating a need to identify prognostic biomarkers for personalized diagnosis and treatment. With the development of new technologies such as next-generation sequencing, multi-omics information are becoming available for a more thorough evaluation of a patient’s condition. In this study, we aim to improve breast cancer overall survival prediction by integrating multi-omics data (e.g., gene expression, DNA methylation, miRNA expression, and copy number variations (CNVs)).MethodsMotivated by multi-view learning, we propose a novel strategy to integrate multi-omics data for breast cancer survival prediction by applying complementary and consensus principles. The complementary principle assumes each -omics data contains modality-unique information. To preserve such information, we develop a concatenation autoencoder (ConcatAE) that concatenates the hidden features learned from each modality for integration. The consensus principle assumes that the disagreements among modalities upper bound the model errors. To get rid of the noises or discrepancies among modalities, we develop a cross-modality autoencoder (CrossAE) to maximize the agreement among modalities to achieve a modality-invariant representation. We first validate the effectiveness of our proposed models on the MNIST simulated data. We then apply these models to the TCCA breast cancer multi-omics data for overall survival prediction.ResultsFor breast cancer overall survival prediction, the integration of DNA methylation and miRNA expression achieves the best overall performance of 0.641 ± 0.031 with ConcatAE, and 0.63 ± 0.081 with CrossAE. Both strategies outperform baseline single-modality models using only DNA methylation (0.583 ± 0.058) or miRNA expression (0.616 ± 0.057).ConclusionsIn conclusion, we achieve improved overall survival prediction performance by utilizing either the complementary or consensus information among multi-omics data. The proposed ConcatAE and CrossAE models can inspire future deep representation-based multi-omics integration techniques. We believe these novel multi-omics integration models can benefit the personalized diagnosis and treatment of breast cancer patients.

Highlights

  • Breast cancer is the most prevalent and among the most deadly cancers in females

  • Novel multi-modality integration network We develop novel multi-omics integration networks based on two principles in multi-view machine learning: 1) the complementary principle assumes that each view contains information other views do not have, and we should extract the difference from each view while preserving the common information; and 2) the consensus principle assumes that the disagreements between views upper bound the classification errors; we should aim to maximize the agreement between views

  • ConcatAE model integrating DNA methylation and miRNA expression principal component analysis (PCA) features achieves the best performance with a concordance index (C-index) of 0.641 ± 0.031 and outperforms that of the cross-modality autoencoder (CrossAE) model (0.63 ± 0.081)

Read more

Summary

Introduction

Breast cancer is the most prevalent and among the most deadly cancers in females. Patients with breast cancer have highly variable survival lengths, indicating a need to identify prognostic biomarkers for personalized diagnosis and treatment. In 2018, breast cancer constituted over 25% of about 8.5 million new cancer diagnoses in female patients [1]. This prevalence pattern is found in the US as well, where women have over a 12% risk of being diagnosed with breast cancer in their lives, and breast cancer cases are expected to encompass about 30% of new cancer cases [2]. Survival rates for breast cancer are typically measured by 5-year post-diagnosis survival. If each cancer stage is considered separately, the 5-year survival rate is 99% for localized breast cancer and drops to 85 and 27% for regionally and distantly spread cancer, respectively

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call