Abstract In cancer, somatic driver mutations often target “hotspots” of paralogous residues across evolutionarily related members of a single protein family. Mutation hotspots usually confer a dominant oncogenic function and are among the most druggable cancer sequence alterations (e.g. EGFR L858R, BRAF V600E). Examples of protein families targeted in such a manner include the Ras and receptor tyrosine kinase families (1). Protein family hotspots are not directly assessed by standard computational approaches for statistically nominating genes under positive somatic selection in cancer, such as MutSig (2), InVex (3), or MuSic (4). These methods only assess single genes for statistical enrichment of total mutation burden or the presence of mutation hotspots, and thus are only powered to detect protein family hotspots in cases where each (or at least one) protein family member is very frequently mutated (e.g. KRAS, NRAS, HRAS). In particular, they are not powered to detect hotspots in highly functionally redundant families, of which each member may be only mutated in <<1% of cases. To directly identify protein-family hotspots, we have developed a computational framework, MutPfam. This framework superimposes mutation data onto Pfam (http://pfam.sanger.ac.uk/) protein family and subfamily multiple sequence alignments and identifies statistically significant hotspots. We demonstrate our method on the Ras Pfam family (PF00071.17) using TCGA Pan-Cancer dataset (https://tcga-data.nci.nih.gov/) comprising somatic mutation calls for 4608 whole-exome sequenced patients and 21 tumor types. Using phylogeny analysis of this 134-member protein family, we analyzed 110 subfamilies for the presence of significantly mutated hotspots. In addition to canonical KRAS, NRAS, and HRAS hotspots in the switch I (e.g. KRAS G12V, KRAS G13D) and switch II (e.g. HRAS Q61R) domains, we identified several other hotspots achieving significance following Bonferonni correction. Among our top hits was a recently identified melanoma hotspot in RAC1 (3), previously uncharacterized paralogues of a known hotspot in RHOA (RHOA E40Q, RALA D49G, RND1 E48K), and a novel recurrently mutated position in the RAB subfamily (e.g. RAB10 R28C, RAB11B R30H, RAB17 R38W). As we demonstrate, our framework enables the de novo discovery of mutation hotspots across protein families, including those that are rarely mutated in any single family member. Computational issues meriting future investigation include (1) alternate definitions of protein alignments and sub-families (e.g. involving three dimensional structure, biophysical constraints) (2) improved statistical modeling of the local background mutation rate (3) incorporation of variant functional impact scores for missense variants. Citation Information: Mol Cancer Ther 2013;12(11 Suppl):A22. Citation Format: Marcin Imielinski, Charles Du, Matthew Meyerson. Identifying somatic mutation hotspots across protein family alignments. [abstract]. In: Proceedings of the AACR-NCI-EORTC International Conference: Molecular Targets and Cancer Therapeutics; 2013 Oct 19-23; Boston, MA. Philadelphia (PA): AACR; Mol Cancer Ther 2013;12(11 Suppl):Abstract nr A22.
Read full abstract