Flexible model-based non-negative matrix factorization with application to mutational signatures.
Somatic mutations in cancer can be viewed as a mixture distribution of several mutational signatures, which can be inferred using non-negative matrix factorization (NMF). Mutational signatures have previously been parametrized using either simple mono-nucleotide interaction models or general tri-nucleotide interaction models. We describe a flexible and novel framework for identifying biologically plausible parametrizations of mutational signatures, and in particular for estimating di-nucleotide interaction models. Our novel estimation procedure is based on the expectation-maximization (EM) algorithm and regression in the log-linear quasi-Poisson model. We show that di-nucleotide interaction signatures are statistically stable and sufficiently complex to fit the mutational patterns. Di-nucleotide interaction signatures often strike the right balance between appropriately fitting the data and avoiding over-fitting. They provide a better fit to data and are biologically more plausible than mono-nucleotide interaction signatures, and the parametrization is more stable than the parameter-rich tri-nucleotide interaction signatures. We illustrate our framework in a large simulation study where we compare to state of the art methods, and show results for three data sets of somatic mutation counts from patients with cancer in the breast, Liver and urinary tract.
- Front Matter
13
- 10.1053/j.gastro.2014.06.019
- Jun 25, 2014
- Gastroenterology
Mutational Signatures in Helicobacter pylori–induced Gastric Cancer: Lessons From New Sequencing Technologies
- Book Chapter
1
- 10.1002/9780470015902.a0023379
- Feb 15, 2012
- Encyclopedia of Life Sciences
Cancer genome sequencing studies have identified tens of thousands of somatic mutations in various human cancers to date. This data has started to generate new insights into mutation patterns and their differences between various cancers. Further, the mutation patterns have also helped to elucidate mechanism involved in generating mutations in cancer genome such as deoxyribonucleic acid (DNA) ‐repair processes and mutagen exposures. With the introduction of next‐generation sequencing technologies, cancer genome sequencing has evolved from a targeted sequencing approach to whole‐exome sequencing and whole‐genome sequencing (WGS) approaches. However, each sequencing approach has its strengths and limitations. It is widely anticipated that WGS would eventually replace targeted and exome sequencing. WGS offers a unique advantage to study structural variants or rearrangements and fusion genes in a single experiment, in addition to point mutations. However, currently the WGS is still prohibitively expensive for a large number of samples. Despite these technological advances, several challenges still remain, such as discerning driver mutations from benign mutations and collection of high‐quality primary tumour tissues to minimise tissue heterogeneity. Ultimately, a comprehensive delineation of the somatic mutations in the cancer genome would require WGS of a large number of samples from various cancer types and subtypes. Congruent to this goal, the International Cancer Genome Consortium was initiated and upon completion of the project, its data is expected to further enhance our knowledge and understanding of the biological mechanisms underlying cancer development. Key Concepts: Several sequencing approaches are available to decipher the somatic mutation profile of cancer genome, including targeted sequencing, whole‐exome sequencing (WES) and whole‐genome sequencing (WGS). Targeted sequencing is a hypothesis‐driven approach where candidate genes were often selected based on knowledge of previously reported mutated genes or their related functions in cancer development, such as the kinome or phosphatome. The ability to study different cancers and their subtypes in a comparatively higher throughput to exome and WGS is an important advantage of the targeted sequencing approach. The introduction of multiple commercial exome enrichment kits has circumvented the technical challenges in isolating the entire exome for sequencing. Although WES interrogates only approximately 1–2% of the entire human genome, the several hundred somatic mutations identified in WES studies have already prohibited all of them to be examined by follow‐up studies in larger sample series. The advances of NGS technologies and rapidly declining sequencing cost have enabled the completion of a number of WGS studies for various cancers. The patterns of somatic mutations provided by WGS are useful in elucidating the mechanisms of generating the mutations in cancer genome such as DNA‐repair processes and differences in mutagen exposure. A further advantage over the other two sequencing approaches of WGS is its ability to identify structural rearrangements. Although WES and WGS approaches have now been shown to be technically feasible, several challenges still remain. Delineating patterns of somatic mutations in the cancer genome require a comprehensive interrogation of cancer genomes (entire genome, exome or a large number of candidate genes) in series of up to hundreds of samples.
- Abstract
126
- 10.1186/gb-2011-12-s1-p3
- Jan 1, 2011
- Genome Biology
The Catalogue Of Somatic Mutations In Cancer (COSMIC) [1] is one of the largest repositories of information on somatic mutations in human cancer. The project has been running for more than ten years as part of the Cancer Genome Project (CGP) at the Wellcome Trust Sanger Institute in the UK. The data in COSMIC are curated from a variety of sources, primarily the scientific literature and large international consortia. The project includes information from the CGP, along with data from other consortia such as the International Cancer Genome Consortium and The Cancer Genome Atlas. In addition, COSMIC is regularly updated with the genes highlighted in the Cancer Gene Census, which curates the scientific literature for known cancer genes [2]. With the advent of whole exome and genome sequencing technology, the amount of data in COSMIC is increasing rapidly. The recent COSMIC release (version 53; 18 May 2011) contains 608,042 tumor and cell line samples, annotating 176,856 mutations across 19,439 genes, with 352 full exomes, 43 whole genome rearrangement screens and 4 full genomes now available. The data are updated regularly, with new releases scheduled every two months. COSMIC provides a large number of graphical and tabular views for interpreting and mining the large quantity of information, as well as the facility to export the relevant data in various formats. The website can be navigated in many ways to examine mutation patterns on the basis of genes, samples and phenotypes, which are the main entry points to COSMIC. COSMIC also provides various options to browse the data in a genomic context. Integration with the Ensembl genome browser allows the visualization of full genome annotations, together with COSMIC data, on the GRCh37 genome coordinates. COSMIC also contains its own genome browser, which facilitates data analysis by combining genome-wide gene structures and sequences with rearrangement breakpoints, copy number variations and all somatic substitutions, deletions, insertions and complex gene mutations. The main COSMIC website [1] encompasses all of the available data. However, within COSMIC, the Cancer Cell Line Project [3] is a specialized component, which provides details of the genotyping of almost 800 commonly used cancer cell lines, through the set of known cancer genes. Its focus is to identify driver mutations, or those likely to be implicated in the oncogenesis of each tumor. This information forms the basis for integrating COSMIC with the Genomics of Drug Sensitivity in Cancer project [4], which is a joint effort with the Massachusetts General Hospital [5] to screen this panel of cancer cell lines against potential anticancer therapeutic compounds to investigate correlations between somatic mutations and drug sensitivity. Data on somatic mutations in cancer are being produced at a rapidly increasing rate, and the combined analysis of large distributed datasets is becoming ever more difficult. However, COSMIC curates and standardizes this information in a single database, providing user-friendly browsing tools and analytical functions, thus ensuring its role as a key resource in human cancer genetics.
- Abstract
- 10.1186/1471-2105-14-s17-a15
- Oct 1, 2013
- BMC Bioinformatics
Background Somatic mutation is the key element of tumorigenesis as these changes in nucleotide sequence of the cancer genome in somatic cells acquired throughout life can lead to protein alteration, cellular damage and thus cause cancer. The advent of next generation sequencing has significantly improved our ability to identify somatic mutations in cancer genomes paving the way for the comprehensive online catalogue for somatic mutation in human cancer (COSMIC) 1 which contains more than 820,000 mutations so far. Nevertheless, there are still many challenges in detecting somatic mutation in cancer especially for low frequency mutation due to either tumor heterogeneity or contamination with normal cells. Here, in this short tutorial, I will present a recent somatic mutation caller tool developed by the Broad Institute called Mutect 2 as part of the GATK (Genome Analysis Toolkit). I will use my own NGS dataset to demonstrate the tools and address some issues of troubleshooting input data and interpreting output.
- Research Article
- 10.1200/jco.2023.41.16_suppl.e17629
- Jun 1, 2023
- Journal of Clinical Oncology
e17629 Background: Whole genome sequencing (WGS) has emerged as a tool to characterize mutational burden and potentially identify new therapeutic targets and predictive biomarkers. In this study, we describe the mutational landscape of diverse patients with endometrial cancer using WGS and explore associations between mutations and clinicopathologic characteristics. Methods: Patients with endometrial cancer were prospectively consented and WGS was performed on tumors and blood. Using the the New York Genome Center Cancer pipeline, we identified high confidence somatic mutations in our patient cohort (identified by 2 or more variant callers). Somatic variants were categorized into mutational signatures based on the Catalog of Somatic Mutations in Cancer (COSMIC) v2. Associations between COSMIC signatures and clinicopathologic variables were evaluated using ANOVA tests. Results: Sixteen patients with endometrial cancer were enrolled; 10 (62.5%) self-identified as White, 4 (25.0%) as Black, and 2 (12.5%) as Asian. Tumor histology was endometrioid (13, 81.3%) or serous/clear cell (2, 12.5%), with both low grade (9, 56.3%) and high grade (7, 43.8%) tumors. The most frequently mutated genes were PTEN (15, 93.8%), ARID1A (9, 56.3%), PIK3CA (9, 56.3%), and TP53 (6, 37.5%). PIK3CA mutations were found in all mismatch repair (MMR) deficient tumors (5), compared to only 30.0% (3) of patients with intact MMR mechanisms (FDR p = 0.0256). Signature 3, associated with BRCA mutations and loss of DNA double-strand break-repair by homologous recombination, was enriched in p53 mutated tumors (p = 0.0036). Signature 1, associated with age, was enriched in patients with MMR intact tumors (p = 0.0475), while signatures 6 and 20, associated with defective DNA MMR and microsatellite unstable tumors, were enriched in patients with in MMR deficient tumors (p < 0.0001 and p = 0.0022, respectively). Signature 8, of unknown etiology, was enriched in patients with MMR intact tumors, advanced disease (p = 0.0045), disease spread to lymph nodes (p = 0.0231), and disease recurrence (p = 0.0021). Signature 12, also of unknown etiology, was enriched in MMR deficient tumors (p = 0.0005) and stage 1 disease (p = 0.0235). Conclusions: WGS revealed distinct patterns of mutational burden associated with clinicopathologic variables. MMR status was associated with PIK3CA mutations and multiple mutational signatures. Mutational signature 8 was associated with multiple clinicopathologic characteristics associated with poor prognosis, suggesting utility as a biomarker. Further investigation regarding the mechanism of the mutational signatures and correlation with prognostic factors may shed light on disparate biologic endometrial cancer outcomes and serve as a source of hypothesis-generation for the development of targeted therapies.
- Front Matter
31
- 10.1152/physiolgenomics.00009.2016
- Jan 26, 2016
- Physiological Genomics
The primary use of the traditional candidate gene approach over the past decades in sports genetics has had limited success in identifying genes associated with elite athletic performance. Advances in high-throughput technologies now permit the application of "omics" (e.g. genomics, transcriptomics, metabolomics, proteomics and epigenomics) approaches to examine the global features of a cell, tissue or organism. "Omics" approaches are being applied with some success to a wide range of pertinent biomedical problems such as cancer diagnosis but also in sports science and sports medicine such as for the identification of biomarkers of trainability or blood doping. There is good evidence to suggest that a combined "omics" solution will greatly facilitate discovery of the genetic influences on sporting performance, training response, injury predisposition and other potential determinants of successful human performance. In this regard, large-scale, collaborative efforts involving well-phenotyped cohorts will be essential for major progress to be made. A recent consensus emerged among 15 research groups active in the field of sports genetics to unite their efforts under one new collaborative initiative named the Athlome Project Consortium. The primary aim of the Athlome Project is to combine resources from individual studies and consortia worldwide to collectively study the genotype and phenotype data available on elite athletes, the adaptation to exercise training (in both human and animal models) and the determinants of exercise-related musculoskeletal injuries. This editorial summarizes the challenges and opportunities facing the Athlome Project Consortium and the field of sports and exercise genomics in general.
- Research Article
- 10.1158/1538-7445.am2014-4267
- Sep 30, 2014
- Cancer Research
Data within the Catalogue of Somatic Mutations in Cancer indicates that a number of different hotspot oncogene mutations occur infrequently in breast tumors as compared to other tumor types (e.g., colon and lung). We hypothesized that greater sensitivity is required to detect somatic mutations in breast tumors, due to their highly-heterogeneous clonal structure. Therefore, the Allele-specific Competitive Blocker PCR method (with a sensitivity of 10-5), was used to quantify mutations in DNA isolated from normal human breast and ductal carcinoma (DC) samples (n = 10). The specific mutations quantified were: KRAS codon 12 GGT to GAT (G12D) and GGT to GTT (G12V), HRAS codon 12 GGC to GAC (G12D), BRAF codon 600 GTG to GAG (V600E), and PIK3CA codon 1047 CGT to CAT (H1047R). In addition, comparison of normal and malignant tissue was employed in order to define the lower bound on abnormal levels of mutation, as tumors that possess a mutant fraction (MF) greater than the upper 95% confidence interval of that present in normal breast. Interestingly, different patterns of mutation were observed for the different hotspot point mutations. We previously demonstrated that significantly higher levels of KRAS mutation are present in colon and lung tumors relative to the cognate normal tissue; however, the frequencies of KRAS codon 12 GGT to GAT and GGT to GTT mutation in normal breast and in DCs were quite similar (∼10-5 for the KRAS codon 12 GGT to GAT mutation in both tissues). Only one normal sample and two DCs had BRAF codon 600 GTG to GAG MFs &gt;10-5. With respect to HRAS codon 12 GGC to GAC mutation, however, ∼80% of DCs had a MF greater than the upper 95% confidence interval of that present in normal breast tissue. Importantly, no breast tumor had a HRAS codon 12 GGC to GAC MF ≥10-1 (i.e., that detectable by DNA sequencing). Unexpectedly high levels of PIK3CA codon 1047 CGT to CAT mutation (&gt;10-2) were observed in 4/10 normal breast samples and 4/10 DCs, suggesting that preexisting PIK3CA mutations in normal breast contribute to breast cancer susceptibility. In summary, all DC samples analyzed contained somatic mutations, none of which would be detectable by DNA sequencing. Consequently, these results indicate somatic mutations are more prevalent in breast cancer than is currently recognized. Ongoing analyses of additional tumors, corresponding to different breast cancer subtypes, will provide a better understanding of the role of somatic mutations in breast cancer. Citation Format: Malathi Banda, Meagan B. Myers, Karen L. McKim, Yiying Wang, Barbara L. Parsons. Quantification of somatic hotspot mutations in KRAS, HRAS, BRAF, and PIK3CA: Comparing normal human breast and ductal carcinoma. [abstract]. In: Proceedings of the 105th Annual Meeting of the American Association for Cancer Research; 2014 Apr 5-9; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2014;74(19 Suppl):Abstract nr 4267. doi:10.1158/1538-7445.AM2014-4267
- Research Article
- 10.1158/1538-7445.sabcs18-p3-06-06
- Feb 15, 2019
- Cancer Research
Background: Our understanding of the biological processes that generate somatic mutations in breast cancer has increased markedly over the past five years. Using the catalog of somatic mutations present in cancer genomes, over 30 “mutational signatures” have been produced. While these provide important insights into the processes responsible for somatic mutation, gaps remain, and the etiology of several signatures remains unknown. Methods: We have developed a new method in which the specific nucleotide change (e.g., C&gt;T), the codon that each mutation falls in (e.g., GCT), the position in the codon (e.g., 2), and the nucleotides immediately 5' and 3' of the mutation (e.g., 5': C; 3': G) are all considered. The summary of these mutation characteristics forms a mutational profile for each tissue sample. Putting multiple samples' profiles together forms a sparse matrix with the number of samples as rows and the mutation characteristics as columns. Nonsmooth nonnegative matrix factorization was then applied to enable the discovery of intrinsic patterns in this sparse matrix. Results: Using somatic mutations identified in 1017 breast cancer tissues from The Cancer Genome Atlas (TCGA), we have identified four mutational signatures. Signature A correlates with the well-defined APOBEC signatures and signature B with the “aging” signature, which is the result 5-methylcytosine hydrolysis. Signature C and signature D are potentially new signatures. Signature D is enriched with C:G&gt;A:T mutations; these mostly occur in the middle position of codons, and are enriched with GG(CC) either 5' or 3' of the mutation's sequence context. G&gt;T mutations are known to occur as a consequence of oxidative damage that is not repaired. Guanines are vulnerable as they have the highest vertical oxidation potential of the nucleobases. The 5' guanines in GG sites are especially reactive. We hypothesize that Signature D results from oxidative mutagenesis. When correlated with clinical phenotypes, the basal subtype is clearly enriched for tumors with the Signature D mutation pattern (exposure level is in 169 basal tumors and in 797 non-basal tumors, p=&lt;0.01), suggesting an etiologic link with basal-like breast cancer. G&gt;T somatic mutations in breast cancer mainly take place during cell replication rather than during transcription. In the normal breast, epithelial cell replication occurs during the luteal phase of the menstrual cycle and during pregnancy, primarily under the direction of progesterone (P). P binds to its receptor (PR) in a subpopulation of PR positive cells where it initiates the transcription of genes including RANKL, with resultant paracrine stimulation (through RANK), of the NF-κB signaling pathway in neighboring cells. The four RANKL genes (TOP2A, MKI67, PBK, CDK1), defined by Nolan et al., are all positively associated with the signature D (P-values &lt; 0.05), suggesting that this type of mutagenesis is associated with RANKL pathway upregulation. Conclusions: We have identified a potentially new somatic mutational signature, which we have designated as Signature D, which appears to result from exposure of DNA to oxidative stress during replication. It is associated with the basal subtype of breast cancer as well as RANKL- NF-κB pathway upregulation. Citation Format: Zeng Z, Vo AH, Luo Y, Khan SA, Clare SE. Novel breast cancer mutational signatures [abstract]. In: Proceedings of the 2018 San Antonio Breast Cancer Symposium; 2018 Dec 4-8; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2019;79(4 Suppl):Abstract nr P3-06-06.
- Research Article
19
- 10.1186/s12920-021-01017-7
- Jun 22, 2021
- BMC Medical Genomics
BackgroundDNA polymerase epsilon (POLE) is encoded by the POLE gene, and POLE-driven tumors are characterized by high mutational rates. POLE-driven tumors are relatively common in endometrial and colorectal cancer, and their presence is increasingly recognized in ovarian cancer (OC) of endometrioid type. POLE-driven cases possess an abundance of TCT > TAT and TCG > TTG somatic mutations characterized by mutational signature 10 from the Catalog of Somatic Mutations in Cancer (COSMIC). By quantifying the contribution of COSMIC mutational signature 10 in RNA sequencing (RNA-seq) we set out to identify POLE-driven tumors in a set of unselected Mayo Clinic OC.MethodsMutational profiles were calculated using expressed single-nucleotide variants (eSNV) in the Mayo Clinic OC tumors (n = 195), The Cancer Genome Atlas (TCGA) OC tumors (n = 419), and the Genotype-Tissue Expression (GTEx) normal ovarian tissues (n = 84). Non-negative Matrix Factorization (NMF) of the mutational profiles inferred the contribution per sample of four distinct mutational signatures, one of which corresponds to COSMIC mutational signature 10.ResultsIn the Mayo Clinic OC cohort we identified six tumors with a predicted contribution from COSMIC mutational signature 10 of over five mutations per megabase. These six cases harbored known POLE hotspot mutations (P286R, S297F, V411L, and A456P) and were of endometrioid histotype (P = 5e−04). These six tumors had an early onset (average age of patients at onset, 48.33 years) when compared to non-POLE endometrioid OC cohort (average age at onset, 60.13 years; P = .008). Samples from TCGA and GTEx had a low COSMIC signature 10 contribution (median 0.16 mutations per megabase; maximum 1.78 mutations per megabase) and carried no POLE hotspot mutations.ConclusionsFrom the largest cohort of RNA-seq from endometrioid OC to date (n = 53), we identified six hypermutated samples likely driven by POLE (frequency, 11%). Our result suggests the clinical need to screen for POLE driver mutations in endometrioid OC, which can guide enrollment in immunotherapy clinical trials.
- Research Article
- 10.1158/1538-7445.am2019-906
- Jul 1, 2019
- Cancer Research
COSMIC, the Catalogue Of Somatic Mutations In Cancer (http://cancer.sanger.ac.uk/cosmic) is a continual effort to integrate all available information on somatic mutations and other molecular alterations causing every form of human cancer. Being the world’s largest and most comprehensive database of somatic mutations in human cancer, it also provides web-based tools for exploration and interpretation of collected data. The content of the database is primarily obtained from the scientific literature by the team of experienced post-doctoral curators and combined with information from online sources, including the TCGA and ICGC. During thorough & exhaustive manual curation, all the available information about mutations and tumor samples (e.g. disease type, demographic data, treatments) are collected, standardized and integrated to allow for both creation of wide virtual cohorts and large-scale studies, as well as precise analysis at the level of a single sample, gene or mutation. The 87th release of COSMIC (Nov 2018) encompasses 5,992,260 coding mutations and 19,574 gene fusions, curated from 1,403,267 cancer samples, including 35,490 whole cancer exomes/genomes, primarily hand-curated data from 26,494 scientific publications. Additionally, COSMIC describes 1,179,545 Copy Number Variants, 9,147,833 gene expression variants, and 7,879,142 differentially methylated CPGs. In addition to this broad database, COSMIC includes a range of specialized projects highlighting specific aspects of cancer in order to emphasize events with a higher impact in disease etiology. This includes the Cancer Gene Census (http://cancer.sanger.ac.uk/census), which defines and describes genes (currently 719) and their dysfunctions driving oncogenesis, and characterizes their impact on hallmarks of cancer. COSMIC3D (http://cancer.sanger.ac.uk/cosmic3d) provides an interactive view of cancer mutations in the context of 3D protein structures, and predicts potential drug-binding sites. Significantly updated 4 times a year, COSMIC is available free-of-charge for academic and non-profit users via COSMIC webpage (http://cancer.sanger.ac.uk/cosmic) or to download through COSMIC downloads (http://cancer.sanger.ac.uk/cosmic/download). Citation Format: Simon Andrew Forbes, david beare, charalampos boutselakis, sally bamford, kate noble, claire rye, john tate, chai yin kok, charlie hathaway, laura ponting, christopher ramshaw, raymund stefancsik, samantha thompson, bhavana harsha, nidhi bindal, shicai wang, steven jupe, helen speedy, celestino creatore, peter fish, sari ward, charlotte cole, elisabeth dawson, zbyslaw sondka. COSMIC: Describing the world’s knowledge of somatic mutations in cancer [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 906.
- Research Article
- 10.1158/1538-7445.am2014-5326
- Sep 30, 2014
- Cancer Research
COSMIC, the Catalogue Of Somatic Mutations In Cancer (http://cancer.sanger.ac.uk) is the world's largest and most comprehensive online resource for exploring the impact of somatic mutations in human cancer. Now running for over 10 years, the 67th release (Oct 2013) describes 1592109 mutations in 947213 tumour samples across 25606 genes. This information is curated manually from the scientific literature, and automatically from genome resequencing consortium data portals. Full curation of the scientific literature provides in-depth understanding of the impact that each gene has in human cancer, and this has been achieved for 127 point-mutated cancer genes, and 185 fusion gene pairs. Curated genes are selected from the Cancer Gene Census (http://cancer.sanger.ac.uk/census), a listing of all genes with substantial evidence implicating them in cancer promotion, currently numbering 513 and updated frequently. The mutations discovered in the re-sequencing of over 8000 tumour genomes are now present in COSMIC (viewable in isolation from the genic curations, http://cancer.sanger.ac.uk/wgs). In addition, the Sanger has now fully exome sequenced 1015 common cancer cell lines, identifying 1146874 coding mutations annotated for functional significance, and this is available exclusively in COSMIC at (http://cancer.sanger.ac.uk/cell_lines). While COSMIC has focused on point mutations and gene fusions, many other mutation mechanisms cause oncogenesis and these are now being integrated. The 67th COSMIC release includes copy number mutations integrated into the database and major web page views. To allow easy graphical examination of this data, copy number information was reduced to ‘gain’ and ‘loss’ annotations for inclusion in histograms and tables, with much more precise detail available with a further click. Copy number data is available in detail for every gene in COSMIC, and also for every tissue. Exploring cancer via COSMIC’s Cancer Browser (http://cancer.sanger.ac.uk/cosmic/browse/tissue), results not just in a plot of the most mutated genes, but now also a circular genome plot summarizing the copy number gains and losses across all the samples from that tumour type, all explorable in more detail via clicks on the pictures. As the genomic data increases in COSMIC, it is becoming more important to qualitatively annotate the information, indicating which is more important or significant to oncogenesis. We are now building systems to better highlight known or putative functional mutations, improving the signal-to-noise ratio of cancer genome resequencing. Citation Format: C Boutselakis, S A. Forbes, P Gunasekaran, M Jia, D Beare, N Bindal, C Y. Kok, K Leung, D Minjie, R Shepherd, S Bamford, S Ward, C Cole, J W. Teague, M Stratton, P Campbell, U McDermott. COSMIC: Enhancing the world's knowledge of somatic mutations in human cancer. [abstract]. In: Proceedings of the 105th Annual Meeting of the American Association for Cancer Research; 2014 Apr 5-9; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2014;74(19 Suppl):Abstract nr 5326. doi:10.1158/1538-7445.AM2014-5326
- Research Article
- 10.1158/1538-7445.am2012-3959
- Apr 15, 2012
- Cancer Research
COSMIC, the Catalog Of Somatic Mutations In Cancer (www.sanger.ac.uk/cosmic) is a single resource combining much of the available data on somatic mutations in human cancer. This information is obtained from a number of online sources, including the TCGA and ICGC webportals, the IARC p53 database and the Cancer Genome Project, Sanger Institute, UK; this is then combined with information carefully curated from the scientific literature. Over 20,000 genes have been screened in the search for cancer causing mutations, and details have been curated for all of these, resulting in the annotation of 213,615 mutations across 665,763 tumor samples (COSMIC v56, Nov 2011). The Cancer Gene Census lists all known cancer genes and these are prioritized for curation, generating extensive mutation spectra across 107 point-mutated genes and 111 fusion gene pairs. Increasing whole genome/exome data are also available; currently 559 samples detail 2753 genome rearrangements, 21892 coding and 57396 non-coding point mutations. Most importantly, this information is all rigorously maintained and regularly updated. With new releases every two months, COSMIC is very responsive to the scientific literature, detailing all the latest findings. The data in COSMIC can be examined in a number of different ways. The website has been designed to allow easy graphical browsing of the data, with many mining options to filter desired information. A COSMIC genome browser can be used to visualize COSMIC mutations in a genomic context, and a Biomart is available to download datasets federated with external databases. In addition, the complete data is available for free download for offline analysis. The combination of data from multiple sources in COSMIC allows its use as a discovery tool as well as a reference datasource; substantial mining can be performed between mutant genotypes and clinical phenotypes, searching for novel recurrence patterns between tissues, genes and individual variants. Overall, the top mutated gene reported in human cancer is JAK2, with a published mutation rate of 38% (n=76853), however a phenotype correlation suggests a mutation rate of up to 86% in Polycythaemia Vera, 58% in Myelofibrosis and 55% in Essential Thrombocythaemia. Similarly, KRAS with an overall mutation rate of 23% (n=96055) increases to 69% in Pancreatic ductal carcinomas, and a single report suggests a 70% mutation frequency for MED12 in Uterine Leiomyomas (n=225). In addition to these well characterized cancer genes, new whole-genome screens are beginning to identify many more. For instance the TCGA Ovarian exome screen suggested roles for genes such as CSMD3, HMCN1 and USH2A, and combining multiple whole genome screens suggests roles for further new genes such as BAI3 (13% mutated in lung). As yet, novel recurrent mutations are challenging to identify across studies, but rapidly increasing amounts of whole genome data will make COSMIC a key resource in this search for new therapeutic targets. Citation Format: {Authors}. {Abstract title} [abstract]. In: Proceedings of the 103rd Annual Meeting of the American Association for Cancer Research; 2012 Mar 31-Apr 4; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2012;72(8 Suppl):Abstract nr 3959. doi:1538-7445.AM2012-3959
- Research Article
55
- 10.1002/ijc.22822
- Jul 23, 2007
- International Journal of Cancer
The complete mitochondrial genomes of the primary cancerous, matched paracancerous normal and distant normal tissues from 10 early-stage breast cancer patients were analyzed in this study, with special attempt (i) to investigate whether the reported high frequency of mitochondrial DNA (mtDNA) somatic mutations in breast cancer could be repeated under a stringent data quality control, and (ii) to characterize the spectrum of mtDNA somatic mutations in Chinese breast cancer patients and evaluate their potential significance in early cancer diagnosis. Two heteroplasmic somatic transitions (T2275C and A8601G) were identified in our samples. The transition A8601G was present in the primary cancerous and paracancerous normal tissues from patient no. 3. Transition T2275C was found in the primary cancerous tissue but not in other normal tissues from patient no. 6; this transition has been reported in the colonic crypts and is located at a highly conserved site in the 16S rRNA gene. Subsequent cloning sequencing confirmed the absence of both mutations in the distant normal tissues from the 2 patients. The overall rate of somatic mutations in our patients was much lower than those of previous studies of breast cancer. Our results gave support to the recent claim that the high frequency of mtDNA somatic mutations in cancer studies is overestimated. Based on the mtDNA mutation pattern in early stage breast cancer observed in this study, we cautioned the enthusiasm and efforts to look for somatic mutations that were of diagnostic value in cancer early detection.
- Research Article
20
- 10.3892/ijo.2013.1991
- Jun 25, 2013
- International Journal of Oncology
Squamous cell lung cancer is a major histotype of non-small cell lung cancer (NSCLC) that is distinct from lung adenocarcinoma. We used whole-exome sequencing to identify novel non-synonymous somatic mutations in squamous cell lung cancer. We identified 101 single-nucleotide variants (SNVs) including 77 non-synonymous SNVs (67 missense and 10 nonsense mutations) and 11 INDELs causing frameshifts. We also found four SNVs located within splicing sites. We verified 62 of the SNVs (51 missense, 10 nonsense and 1 splicing-site mutation) and 10 of the INDELs as somatic mutations in lung cancer tissue. Sixteen of the mutated genes were also mutated in at least one patient with a different type of lung cancer in the Catalogue of Somatic Mutation in Cancer (COSMIC) database. Four genes (LPHN2, TP53, MYH2 and TGM2) were mutated in approximately 10% of the samples in the COSMIC database. We identified two missense mutations in C10orf137 and MS4A3 that also occurred in other solid-tumor tissues in the COSMIC database. We found another somatic mutation in EP300 that was mutated in 4.2% of the 2,020 solid-tumor samples in the COSMIC database. Taken together, our results implicate TP53, EP300, LPHN2, C10orf137, MYH2, TGM2 and MS4A3 as potential driver genes of squamous cell lung cancer.
- Research Article
88
- 10.1186/s13073-021-00883-1
- Apr 29, 2021
- Genome Medicine
BackgroundIdentification of germline variation and somatic mutations is a major issue in human genetics. However, due to the limitations of DNA sequencing technologies and computational algorithms, our understanding of genetic variation and somatic mutations is far from complete.MethodsIn the present study, we performed whole-genome sequencing using long-read sequencing technology (Oxford Nanopore) for 11 Japanese liver cancers and matched normal samples which were previously sequenced for the International Cancer Genome Consortium (ICGC). We constructed an analysis pipeline for the long-read data and identified germline and somatic structural variations (SVs).ResultsIn polymorphic germline SVs, our analysis identified 8004 insertions, 6389 deletions, 27 inversions, and 32 intra-chromosomal translocations. By comparing to the chimpanzee genome, we correctly inferred events that caused insertions and deletions and found that most insertions were caused by transposons and Alu is the most predominant source, while other types of insertions, such as tandem duplications and processed pseudogenes, are rare. We inferred mechanisms of deletion generations and found that most non-allelic homolog recombination (NAHR) events were caused by recombination errors in SINEs. Analysis of somatic mutations in liver cancers showed that long reads could detect larger numbers of SVs than a previous short-read study and that mechanisms of cancer SV generation were different from that of germline deletions.ConclusionsOur analysis provides a comprehensive catalog of polymorphic and somatic SVs, as well as their possible causes. Our software are available at https://github.com/afujimoto/CAMPHOR and https://github.com/afujimoto/CAMPHORsomatic.