Abstract

Cancer initiation and progression are caused bydrivermutations that are vastly outnumbered by the mutations that accumulate due to cancer-associated genome instability. With genome-wide detection of somatic mutations now becoming commonplace for moderate-sized cancer studies, improvements in methods for discriminating driver from passenger mutations would significantly advance the field of cancer biology. In large-cohort studies, recurrence of mutations within a regulatory element can be used to identify probable driver mutations; but for small-cohort studies of new cancer types or subtypes, the recurrence approach by itself has limited statistical power. In such cases, bioinformatic approaches work well for functional assessment of somatic mutations within protein-coding genes, but how to functionally assess noncoding somatic mutations--using large-scale datasets of measurements and information about the local genomic context of the mutation--is a fundamental open problem in bioinformatics. Based on recent reports of specific noncoding mutations that drive cancer progression, we proposed and investigated a recurrence-based regression approach for quantifying the cancer-promoting potential of the local genomic and chromatin context of a somatic mutation. We integrated 29 genomic correlates (from sequence conservation, sequence GC content, distance to the nearest gene, and ENCODE project genome location datasets) within seven different regression models in three model classes (generalized linear models (GLMs), ensemble decision tree models, and neural network models). We trained and tested the models using a combined dataset of 4.5 million noncoding somatic mutations from 20 different types of cancer. We then characterized the models' accuracies and obtained relative importance scores for the features. We found that the Poisson regression model performs the best among the regression models and that a deep neural network structure is promising for predicting noncoding mutation recurrence.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call