Abstract

BackgroundToday, there are a lot of markers on the prognosis and diagnosis of complex diseases such as primary breast cancer. However, our understanding of the drivers that influence cancer aggression is limited.MethodsIn this work, we study somatic mutation data consists of 450 metastatic breast tumor samples from cBio Cancer Genomics Portal. We use four software tools to extract features from this data. Then, an ensemble classifier (EC) learning algorithm called EARN (Ensemble of Artificial Neural Network, Random Forest, and non-linear Support Vector Machine) is proposed to evaluate plausible driver genes for metastatic breast cancer (MBCA). The decision-making strategy for the proposed ensemble machine is based on the aggregation of the predicted scores obtained from individual learning classifiers to be prioritized homo sapiens genes annotated as protein-coding from NCBI.ResultsThis study is an attempt to focus on the findings in several aspects of MBCA prognosis and diagnosis. First, drivers and passengers predicted by SVM, ANN, RF, and EARN are introduced. Second, biological inferences of predictions are discussed based on gene set enrichment analysis. Third, statistical validation and comparison of all learning methods are performed by some evaluation metrics. Finally, the pathway enrichment analysis (PEA) using ReactomeFIVIz tool (FDR < 0.03) for the top 100 genes predicted by EARN leads us to propose a new gene set panel for MBCA. It includes HDAC3, ABAT, GRIN1, PLCB1, and KPNA2 as well as NCOR1, TBL1XR1, SIRT4, KRAS, CACNA1E, PRKCG, GPS2, SIN3A, ACTB, KDM6B, and PRMT1. Furthermore, we compare results for MBCA to other outputs regarding 983 primary tumor samples of breast invasive carcinoma (BRCA) obtained from the Cancer Genome Atlas (TCGA). The comparison between outputs shows that ROC-AUC reaches 99.24% using EARN for MBCA and 99.79% for BRCA. This statistical result is better than three individual classifiers in each case.ConclusionsThis research using an integrative approach assists precision oncologists to design compact targeted panels that eliminate the need for whole-genome/exome sequencing. The schematic representation of the proposed model is presented as the Graphic abstract.Graphic abstract

Highlights

  • Today, there are a lot of markers on the prognosis and diagnosis of complex diseases such as primary breast cancer

  • We evaluate the top genes predicted by Ensemble of Artificial Neural Network (EARN) and three base classifiers for breast invasive carcinoma (BRCA) and metastatic breast cancer (MBCA) by searching these genes in the list of cancer-associated genes in the public databases, including the Online Mendelian Inheritance in Man (OMIM), the Cancer Gene Census (CGC) [47], the Network of Cancer Genes (NCG) [48, 49], and the human cancer metastasis database (HCMDB) [50]

  • (2) Driver genes and passengers predicted by three individual machine learning methods, non-linear Support Vector Machine (NLSVM), Artificial Neural Network (ANN), Random Forest (RF), and the proposed ensemble classifier (EC) are introduced

Read more

Summary

Introduction

There are a lot of markers on the prognosis and diagnosis of complex diseases such as primary breast cancer. Precision oncology by focusing on targeted clinical panel sequencing can be helpful in new treatment targets [5], i.e., a breast cancerspecific NGS panel, including 79 genes has been validated to use in identifying primary and metastatic breast cancer [6] In this way, the advent of bioinformatics tools in parallel with the development of molecular techniques could lead to discovering biomarkers that are efficient in cancer diagnosis and prognosis [7]. The machine learning algorithms as one of the computational approaches can be trained with data from countless patients whereas it is too difficult for human physicians and biologists to gain such experience in an entire career or their researches These models equip experts to make better decisions [8]. 22 approaches have been reported for analyzing breast cancer data in the literature (Table 1)

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call