Abstract

Abstract Genome sequencing studies reveal an overwhelming number of mutations in cancer. Chromosomal losses and gains or missense and nonsense point mutations altering tumor suppressor genes and oncogenes are widely studied. However, also genetic changes leaving the protein sequence intact can significantly impact cancer genes (1) e.g. by affecting the translation, RNA structure or stability of the mutated transcript. Here, we analyze 3.88 million mutations identified in whole genome sequencing studies of cancer tissues and cell lines including but not limited to TCGA and ICGC data. After curation of the dataset derived from the Catalog of Somatic Mutations In Cancer (COSMIC) for duplicate entries as well as annotation errors, 2.81 million mutations remain in 20414 human genes of 18028 samples from 88 different tumor entities. In this dataset, we find 659191 synonymous mutations which alter the nucleotide sequence but not the amino acid sequence of the respective protein due to the redundancy of the genetic code. Hence, synonymous mutations are the second most frequent type of mutation (23.1%) after missense mutations (64.1%), but much more frequent than nonsense mutations, deletions or insertions (4.3%, 3.2%, 1.4%). While the latter are widely characterized as tumor causing, synonymous mutations have hardly been studied at all—in part due to the lack of a comprehensive and searchable resource. Based on our platform, we compare synonymous (syn) and missense (mis) mutations and find striking parallels making it likely that at least some syn mutations have a similar impact on tumorigenesis. 176.590 syn mutations are found recurrently—similar to the recurrence fraction of mis mutations (26.8% vs. 29.1%). Known cancer genes from the Cancer Gene Census (2.8% of all genes) are enriched in syn as well as in mis mutations (3.8% vs. 4.8%)—in turn, more than 95% of both types of mutations are found in genes not yet associated with cancer leaving a lot of room for discoveries. Somatic syn and mis mutation catalogs contain a similar fraction of known Single Nucleotide Polymorphisms (SNPs, 8.1% vs. 8.3%). Notably, syn as well as mis mutations are significantly deriched in the first 5% of the coding sequence indicating a potential selection due to their impact on translation initiation vs. N-terminal signal sequences or misfolding, respectively. Importantly, we added conservation scores for each affected nucleotide which may reflect its functional relevance or its localization in a regulatory motif—again, syn and mis mutations were equally conserved (6.6% vs. 6.6% PhastCons >0,9). The mutational patterns of syn and mis mutations are similar with C>T = G>A changes being the most frequent (67.2% vs. 49.3%). Syn mutations are not randomly distributed across the codons, but Arg codons are under—while Phe codons are over-represented. Our comprehensive dataset is available to the scientific community in a user-friendly database: SynMICdb (www.SynMICdb.org), the Synonymous Mutations In Cancer database. It allows also non-bioinformaticians to search for synonymous mutations according to their frequency, presence in specific tumor entities or evolutionary conservation. To search for synonymous mutations potentially affecting translation initiation, elongation or termination, researchers can select mutations based on their localization within the coding sequence. In summary, SynMICdb offers the first comprehensive resource enabling research on synonymous mutations in cancer and provides important insights into the characteristics of this abundant, but frequently overlooked class of mutations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call