Abstract

BackgroundCancer sequencing projects are now measuring somatic mutations in large numbers of cancer genomes. A key challenge in interpreting these data is to distinguish driver mutations, mutations important for cancer development, from passenger mutations that have accumulated in somatic cells but without functional consequences. A common approach to identify genes harboring driver mutations is a single gene test that identifies individual genes that are recurrently mutated in a significant number of cancer genomes. However, the power of this test is reduced by: (1) the necessity of estimating the background mutation rate (BMR) for each gene; (2) the mutational heterogeneity in most cancers meaning that groups of genes (e.g. pathways), rather than single genes, are the primary target of mutations.ResultsWe investigate the problem of discovering driver pathways, groups of genes containing driver mutations, directly from cancer mutation data and without prior knowledge of pathways or other interactions between genes. We introduce two generative models of somatic mutations in cancer and study the algorithmic complexity of discovering driver pathways in both models. We show that a single gene test for driver genes is highly sensitive to the estimate of the BMR. In contrast, we show that an algorithmic approach that maximizes a straightforward measure of the mutational properties of a driver pathway successfully discovers these groups of genes without an estimate of the BMR. Moreover, this approach is also successful in the case when the observed frequencies of passenger and driver mutations are indistinguishable, a situation where single gene tests fail.ConclusionsAccurate estimation of the BMR is a challenging task. Thus, methods that do not require an estimate of the BMR, such as the ones we provide here, can give increased power for the discovery of driver genes.

Highlights

  • Cancer sequencing projects are measuring somatic mutations in large numbers of cancer genomes

  • In the first model we prove a bound on the number of patients required to detect all driver genes with high probability using a single gene test, while in the second model it is not possible to identify the driver genes using such a test for any number of patients

  • In the D>P model, the driver mutation probability is larger than the passenger mutation probability

Read more

Summary

Introduction

Cancer sequencing projects are measuring somatic mutations in large numbers of cancer genomes. A key challenge in interpreting these data is to distinguish driver mutations, mutations important for cancer development, from passenger mutations that have accumulated in somatic cells but without functional consequences. A key challenge in cancer genomics is to distinguish driver mutations, mutations important for cancer development, from random passenger mutations that have accumulated in somatic cells but do not have functional consequences. The standard technique to identify such recurrently mutated genes is to perform a single gene test, in which individual genes are tested to determine if their observed frequency of mutation is significantly higher than expected [1,2,3] This approach has identified a number of important cancer genes, but has not revealed all of the driver mutations and driver genes in individual cancers

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call