Abstract

Breast cancer is the most common malignancy in women, and because it has a high mortality rate, it is urgent to develop computational methods to increase the accuracy of breast cancer survival predictive models. Although multi-omics data such as gene expression have been extensively used in recent studies, the accurate prognosis of breast cancer remains a challenge. Somatic mutations are another important and promising data source for studying cancer development, and its effect on the prognosis of breast cancer remains to be further explored. Meanwhile, these omics datasets are high-dimensional and redundant. Therefore, we adopted multiple kernel learning (MKL) to efficiently integrate somatic mutation to currently molecular data including gene expression, copy number variation (CNV), methylation, and protein expression data for the prediction of breast cancer survival. Before integration, the maximum relevance minimum redundancy (mRMR) feature selection method was utilized to select features that present high relevance to survival and low redundancy among themselves for each type of data. The experimental results demonstrated that the proposed method achieved the most optimal performance and there was a remarkable improvement in the prediction performance when somatic mutations were included, indicating that somatic mutations are critical for improving breast cancer survival predictions. Moreover, mRMR was superior to other feature selection methods used in previous studies. Furthermore, MKL outperformed the other traditional classifiers in multi-omics data integration. Our analysis indicated that through employing promising omics data such as somatic mutations and harnessing the power of proper feature selection methods and effective integration frameworks, the breast cancer survival predictive accuracy can be further increased, thereby providing a more optimal clinical diagnosis and more effective treatment for breast cancer patients.

Highlights

  • Breast cancer is the most common malignant tumor in women

  • In order to gauge the performance of our method, first, the newly introduced method was compared with different single data types and integrated datasets to verify the effectiveness of somatic mutations, and the results indicated that there was a remarkable improvement in the prediction performance when somatic mutations were included

  • The three main steps include: (1) The most N-informative features were separately selected using the maximum relevance minimum redundancy (mRMR) method for each type of data in the learning dataset; (2) SimpleMKL with 10-fold cross-validation was deployed on the learning dataset for breast cancer prognosis to train an optimal model; and (3) the prediction model on learning dataset and the validation dataset were evaluated for their ability to learn data

Read more

Summary

Introduction

Breast cancer is the most common malignant tumor in women. there are millions of breast cancer survivors in the United States, breast cancer is the main cause of cancer-related deaths worldwide because of its high mortality rate (Ferlay et al, 2010). It is urgent to design highly accurate methods to predict the survival of breast cancer patients. The Cox regression model (Yuan et al, 2014; Xu et al, 2016) and traditional machine learning classification methods, such as support vector machine (SVM) (Xu et al, 2013), Bayes classifier (Gevaert et al, 2006), and random forest (RF) (Nguyen et al, 2013), have been widely deployed to identify breast cancer prognostic biomarkers. Multiple survival prediction models have been mainly developed based on gene expression data. The advancement of machine learning technologies enables various data types to be combined within a model (Chen et al, 2019; Lan et al, 2020), which may increase the accuracy of predictive models

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.