Abstract

In-silico classification of the pathogenic status of somatic variants is shown to be promising in promoting the clinical utilization of genetic tests. Majority of the available classification tools are designed based on the characteristics of germline variants or the combination of germline and somatic variants. Significance of somatic variants in cancer initiation and progression urges for development of classifiers specialized for classifying pathogenic status of cancer somatic variants based on the model trained on cancer somatic variants. We established a gold standard exclusively for cancer somatic single nucleotide variants (SNVs) collected from the catalogue of somatic mutations in cancer. We developed two support vector machine (SVM) classifiers based on genomic features of cancer somatic SNVs located in coding and non-coding regions of the genome, respectively. The SVM classifiers achieved the area under the ROC curve of 0.94 and 0.89 regarding the classification of the pathogenic status of coding and non-coding cancer somatic SNVs, respectively. Our models outperform two well-known classification tools including FATHMM-FX and CScape in classifying both coding and non-coding cancer somatic variants. Furthermore, we applied our models to predict the pathogenic status of somatic variants identified in young breast cancer patients from METABRIC and TCGA-BRCA studies. The results indicated that using the classification threshold of 0.8 our “coding” model predicted 1853 positive SNVs (out of 6,910) from the TCGA-BRCA dataset, and 500 positive SNVs (out of 1882) from the METABRIC dataset. Interestingly, through comparative survival analysis of the positive predictions from our models, we identified a young-specific pathogenic somatic variant with potential for the prognosis of early onset of breast cancer in young women.

Highlights

  • Leverage of high-throughput technologies has given rise to an ever-increasing list of sequenced genes, exomes, transcriptomes and genomes

  • We studied the landscape of somatic single nucleotide variants (SNVs) in young breast cancer patients by applying our trained models to the data from two cohort studies including METABRIC (Molecular Taxonomy of Breast Cancer International Consortium) (Pereira et al, 2016) and TCGA-BRCA (The Cancer Genome Atlas Breast Invasive Carcinoma) (Grossman et al, 2016), both representing the genomic profile of breast cancer tumors

  • Computational models for classifying the pathogenic status of cancer somatic variants located in coding and noncoding regions of the genome were developed in this study

Read more

Summary

Introduction

Leverage of high-throughput technologies has given rise to an ever-increasing list of sequenced genes, exomes, transcriptomes and genomes. Genomic variants identified through sequencing can relate to susceptibility to complex diseases such as cancer This is applicable to the variants that affect the genes associated with critical cellular events such as cell cycle process regulation, DNA mismatch repair, metabolism and immunity (Landau et al, 2015; Oldridge et al, 2015). Following the report of the first somatic mutation identified in a human oncogene (Reddy et al, 1982; Tabin et al, 1982), a substantial number of oncogenes and their relevant somatic mutations have been detected (Futreal et al, 2004) These mutations can be either pathogenic driver variants, conferring fitness advantages to tumor cells (Hodis et al, 2012), or passenger benign variants, biologically neutral mutations with no growth/survival advantages (Greenman et al, 2007). The biggest challenge of all systemic mutation screenings is to distinguish between the two groups of variants

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call