Abstract

Tumour sequencing identifies highly recurrent point mutations in cancer driver genes, but rare functional mutations are hard to distinguish from large numbers of passengers. We developed a novel computational platform applying a multi-modal approach to filter out passengers and more robustly identify putative driver genes. The primary filter identifies enrichment of cancer mutations in CATH functional families (CATH-FunFams) – structurally and functionally coherent sets of evolutionary related domains. Using structural representatives from CATH-FunFams, we subsequently seek enrichment of mutations in 3D and show that these mutation clusters have a very significant tendency to lie close to known functional sites or conserved sites predicted using CATH-FunFams. Our third filter identifies enrichment of putative driver genes in functionally coherent protein network modules confirmed by literature analysis to be cancer associated. Our approach is complementary to other domain enrichment approaches exploiting Pfam families, but benefits from more functionally coherent groupings of domains. Using a set of mutations from 22 cancers we detect 151 putative cancer drivers, of which 79 are not listed in cancer resources and include recently validated cancer associated genes EPHA7, DCC netrin-1 receptor and zinc-finger protein ZNF479.

Highlights

  • Advances in technology have made exome and whole-genome sequencing commonplace and have been catalysts for large-scale concerted cancer genome sequencing efforts such as TCGA1 and ICGC2

  • We grouped somatic missense mutations from 9,950 whole exome samples in COSMIC v71 into 22 cancer types from which we identified a total of 259 mutationally enriched families (MutFams) (p < 0.05, permutation test with Benjami-Hochberg correction)

  • There is a correlation between the number of mutations and the number of MutFams identified (Pearson’s r = 0.84, p < 0.0001; Supplementary Fig. 1), an observation supported by a recent study reporting a correlation between the overall mutation burden of a cancer type and the number of driver genes identified[46]

Read more

Summary

Introduction

Advances in technology have made exome and whole-genome sequencing commonplace and have been catalysts for large-scale concerted cancer genome sequencing efforts such as TCGA1 and ICGC2. Mutations observed in tumours may be drivers, positively influencing tumour progression, or passengers, which are incidental and have no net effect[3] Methods such as MutSigCV4 analyse somatic mutations from tumour samples to identify sequence positions mutated above a significance threshold using a sophisticated model of background mutation rates. As evolutionarily-related, discrete & independently folding units of sequence, domains are often found in multiple genes and in different contexts (i.e. multiple domain architectures), domain enrichment may enhance both the statistical power for driver detection and allow clearer prediction of the functional impacts of mutations[20,21,22]. Derived functional interaction networks[37], built from both known and predicted protein-protein interactions, can be used to analyse driver gene lists for functional consequences, for example, by using GO term enrichment of network modules identified. Protein-protein interaction networks such as STRING38 permit topological predictions, such as whether a particular gene is a hub or how dispersed a set of genes are on the network[39]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call