Discriminating activating, deactivating and resistance variants in protein kinases

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

BackgroundDistinguishing whether genetic variants in protein kinases cause gain or loss of function is critical in clinical genetics. In particular, gain (and not loss)-of-function variants are often immediately amenable to treatment by inhibitors, making their identification a potential boon to personalised medicine. Most existing computational methods for variant pathogenicity prediction simply distinguish damaging from benign variants and provide no further functional insights. Here, we present a data-driven approach that differentiates activating, deactivating, and resistance variants.MethodsTo train and evaluate our method, we curated a dataset of 2505 variants (375 activating, 1028 deactivating, 98 resistance and 1004 neutral) across 441 kinases from the literature and public databases. Each variant was represented as a vector of sequence, evolutionary and structural features, which we then used to train machine learning models to distinguish among the four types of variants. The resulting predictors achieved excellent performance (mean AUC = 0.941). We tested a selection of variants by over-expression in T-REx-293 cells followed by gene expression or biochemical tests.ResultsApplying the predictors to uncharacterised variants, we observed a strong enrichment of activating mutations in cancer genomes, deactivating variants in hereditary disease, and few of either in variants from healthy individuals. We experimentally validated several predicted activating variants from cancer samples. For p.Ser97Asn in PIM1, phosphorylation events suggested increased activity. For p.Ala84Thr in MAP2K3, gene expression and mitochondrial staining showed a reduction in mitochondrial function, the opposite effect of MAP2K3 deletions. We provide an online application that enables users to analyse any kinase-domain variant, obtain prediction scores and explore known nearby variants in other kinases.ConclusionsOur predictors, together with the rapid experimental validations, demonstrates a feasible strategy for identifying activating variants in kinases in a time frame that would enable clinical decisions.Supplementary InformationThe online version contains supplementary material available at 10.1186/s13073-025-01564-z.

Similar Papers
  • Research Article
  • Cite Count Icon 29
  • 10.1002/emmm.201202388
Journeys into the genome of cancer cells
  • Jan 22, 2013
  • EMBO Molecular Medicine
  • Michael R Stratton

I come from a family in which there have been no scientists or doctors. I was interested, however, in biology at school and started my scientific career by training in medicine at Oxford University and Guys Hospital, London. Practising as a doctor reinforced my curiosity about the biological processes underlying human disease. As a consequence, I pursued a clinical vocation in histopathology, a discipline that couples exposure to the sights and smells of the autopsy room with a daily journey into the often beautiful, sometimes ugly world of healthy and diseased human tissues under the microscope. After an introduction to general histopathology in Nick Wright's department at the Hammersmith Hospital, London, I completed my postgraduate medical training in neuropathology with Peter Lantos at the Maudsley Hospital, London.Peering at the nuclei of cancer cells under the microscope, for me it was a matter of fascination that hidden within them were the key events converting normal cells into cancer cells, and frustration because they were out of reach. Many of the tissue samples examined by pathologists are from cancers. The clonal theory of cancer development and the general role of DNA mutations in generating cancer cell clones had been established by 1986 when I was working as a junior pathologist. Indeed, the first mutated cancer gene, HRAS , had recently been identified through application of the, then new, technologies of recombinant DNA technology. Peering at the nuclei of cancer cells under the microscope, for me it was a matter of fascination that hidden within them were the key events converting normal cells into cancer cells, and frustration because they were out of reach. So, I took 3 years break from medicine to study for a PhD, learning the methods and thinking of molecular oncology in Colin Cooper's laboratory at the Institute …

  • Research Article
  • Cite Count Icon 27
  • 10.1093/bioinformatics/btaa242
CScape-somatic: distinguishing driver and passenger point mutations in the cancer genome
  • Apr 13, 2020
  • Bioinformatics
  • Mark F Rogers + 2 more

MotivationNext-generation sequencing technologies have accelerated the discovery of single nucleotide variants in the human genome, stimulating the development of predictors for classifying which of these variants are likely functional in disease, and which neutral. Recently, we proposed CScape, a method for discriminating between cancer driver mutations and presumed benign variants. For the neutral class, this method relied on benign germline variants found in the 1000 Genomes Project database. Discrimination could, therefore, be influenced by the distinction of germline versus somatic, rather than neutral versus disease driver. This motivates this article in which we consider predictive discrimination between recurrent and rare somatic single point mutations based solely on using cancer data, and the distinction between these two somatic classes and germline single point mutations.ResultsFor somatic point mutations in coding and non-coding regions of the genome, we propose CScape-somatic, an integrative classifier for predictively discriminating between recurrent and rare variants in the human cancer genome. In this study, we use purely cancer genome data and investigate the distinction between minimal occurrence and significantly recurrent somatic single point mutations in the human cancer genome. We show that this type of predictive distinction can give novel insight, and may deliver more meaningful prediction in both coding and non-coding regions of the cancer genome. Tested on somatic mutations, CScape-somatic outperforms alternative methods, reaching 74% balanced accuracy in coding regions and 69% in non-coding regions, whereas even higher accuracy may be achieved using thresholds to isolate high-confidence predictions.Availability and implementationPredictions and software are available at http://CScape-somatic.biocompute.org.uk/.Contactmark.f.rogers.phd@gmail.com or C.Campbell@bristol.ac.ukSupplementary informationSupplementary data are available at Bioinformatics online.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1002/9780470015902.a0023379
Characterising Somatic Mutations in Cancer Genome by Means of Next‐generation Sequencing
  • Feb 15, 2012
  • Mei Ling Chong + 3 more

Cancer genome sequencing studies have identified tens of thousands of somatic mutations in various human cancers to date. This data has started to generate new insights into mutation patterns and their differences between various cancers. Further, the mutation patterns have also helped to elucidate mechanism involved in generating mutations in cancer genome such as deoxyribonucleic acid (DNA) ‐repair processes and mutagen exposures. With the introduction of next‐generation sequencing technologies, cancer genome sequencing has evolved from a targeted sequencing approach to whole‐exome sequencing and whole‐genome sequencing (WGS) approaches. However, each sequencing approach has its strengths and limitations. It is widely anticipated that WGS would eventually replace targeted and exome sequencing. WGS offers a unique advantage to study structural variants or rearrangements and fusion genes in a single experiment, in addition to point mutations. However, currently the WGS is still prohibitively expensive for a large number of samples. Despite these technological advances, several challenges still remain, such as discerning driver mutations from benign mutations and collection of high‐quality primary tumour tissues to minimise tissue heterogeneity. Ultimately, a comprehensive delineation of the somatic mutations in the cancer genome would require WGS of a large number of samples from various cancer types and subtypes. Congruent to this goal, the International Cancer Genome Consortium was initiated and upon completion of the project, its data is expected to further enhance our knowledge and understanding of the biological mechanisms underlying cancer development. Key Concepts: Several sequencing approaches are available to decipher the somatic mutation profile of cancer genome, including targeted sequencing, whole‐exome sequencing (WES) and whole‐genome sequencing (WGS). Targeted sequencing is a hypothesis‐driven approach where candidate genes were often selected based on knowledge of previously reported mutated genes or their related functions in cancer development, such as the kinome or phosphatome. The ability to study different cancers and their subtypes in a comparatively higher throughput to exome and WGS is an important advantage of the targeted sequencing approach. The introduction of multiple commercial exome enrichment kits has circumvented the technical challenges in isolating the entire exome for sequencing. Although WES interrogates only approximately 1–2% of the entire human genome, the several hundred somatic mutations identified in WES studies have already prohibited all of them to be examined by follow‐up studies in larger sample series. The advances of NGS technologies and rapidly declining sequencing cost have enabled the completion of a number of WGS studies for various cancers. The patterns of somatic mutations provided by WGS are useful in elucidating the mechanisms of generating the mutations in cancer genome such as DNA‐repair processes and differences in mutagen exposure. A further advantage over the other two sequencing approaches of WGS is its ability to identify structural rearrangements. Although WES and WGS approaches have now been shown to be technically feasible, several challenges still remain. Delineating patterns of somatic mutations in the cancer genome require a comprehensive interrogation of cancer genomes (entire genome, exome or a large number of candidate genes) in series of up to hundreds of samples.

  • Preprint Article
  • 10.1158/0008-5472.c.6512164
Data from A Deep Learning Framework Identifies Pathogenic Noncoding Somatic Mutations from Personal Prostate Cancer Genomes
  • Mar 31, 2023
  • Cheng Wang + 1 more

<div>Abstract<p>Our understanding of noncoding mutations in cancer genomes has been derived primarily from mutational recurrence analysis by aggregating clinical samples on a large scale. These cohort-based approaches cannot directly identify individual pathogenic noncoding mutations from personal cancer genomes. Therefore, although most somatic mutations are localized in the noncoding cancer genome, their effects on driving tumorigenesis and progression have not been systematically explored and noncoding somatic alleles have not been leveraged in current clinical practice to guide personalized screening, diagnosis, and treatment. Here, we present a deep learning framework to capture pathogenic noncoding mutations in personal cancer genomes, which perturb gene regulation by altering chromatin architecture. We deployed the system specifically for localized prostate cancer by integrating large-scale prostate cancer genomes and the prostate-specific epigenome. We exhaustively evaluated somatic mutations in each patient's genome and agnostically identified thousands of somatic alleles altering the prostate epigenome. Functional genomic analyses subsequently demonstrated that affected genes displayed differential expression in prostate tumor samples, were vulnerable to expression alterations, and were convergent onto androgen receptor–mediated signaling pathways. Accumulation of pathogenic regulatory mutations in these affected genes was predictive of clinical observations, suggesting potential clinical utility of this approach. Overall, the deep learning framework has significantly expanded our view of somatic mutations in the vast noncoding genome, uncovered novel genes in localized prostate cancer, and will foster the development of personalized screening and therapeutic strategies for prostate cancer.</p>Significance:<p>This study's characterization of the noncoding genome in prostate cancer reveals mutational signatures predictive of clinical observations, which may serve as a powerful prognostic tool in this disease.</p></div>

  • Preprint Article
  • 10.1158/0008-5472.c.6512164.v1
Data from A Deep Learning Framework Identifies Pathogenic Noncoding Somatic Mutations from Personal Prostate Cancer Genomes
  • Mar 31, 2023
  • Cheng Wang + 1 more

<div>Abstract<p>Our understanding of noncoding mutations in cancer genomes has been derived primarily from mutational recurrence analysis by aggregating clinical samples on a large scale. These cohort-based approaches cannot directly identify individual pathogenic noncoding mutations from personal cancer genomes. Therefore, although most somatic mutations are localized in the noncoding cancer genome, their effects on driving tumorigenesis and progression have not been systematically explored and noncoding somatic alleles have not been leveraged in current clinical practice to guide personalized screening, diagnosis, and treatment. Here, we present a deep learning framework to capture pathogenic noncoding mutations in personal cancer genomes, which perturb gene regulation by altering chromatin architecture. We deployed the system specifically for localized prostate cancer by integrating large-scale prostate cancer genomes and the prostate-specific epigenome. We exhaustively evaluated somatic mutations in each patient's genome and agnostically identified thousands of somatic alleles altering the prostate epigenome. Functional genomic analyses subsequently demonstrated that affected genes displayed differential expression in prostate tumor samples, were vulnerable to expression alterations, and were convergent onto androgen receptor–mediated signaling pathways. Accumulation of pathogenic regulatory mutations in these affected genes was predictive of clinical observations, suggesting potential clinical utility of this approach. Overall, the deep learning framework has significantly expanded our view of somatic mutations in the vast noncoding genome, uncovered novel genes in localized prostate cancer, and will foster the development of personalized screening and therapeutic strategies for prostate cancer.</p>Significance:<p>This study's characterization of the noncoding genome in prostate cancer reveals mutational signatures predictive of clinical observations, which may serve as a powerful prognostic tool in this disease.</p></div>

  • Research Article
  • Cite Count Icon 12
  • 10.1158/0008-5472.can-20-1791
A Deep Learning Framework Identifies Pathogenic Noncoding Somatic Mutations from Personal Prostate Cancer Genomes.
  • Nov 1, 2020
  • Cancer Research
  • Cheng Wang + 1 more

Our understanding of noncoding mutations in cancer genomes has been derived primarily from mutational recurrence analysis by aggregating clinical samples on a large scale. These cohort-based approaches cannot directly identify individual pathogenic noncoding mutations from personal cancer genomes. Therefore, although most somatic mutations are localized in the noncoding cancer genome, their effects on driving tumorigenesis and progression have not been systematically explored and noncoding somatic alleles have not been leveraged in current clinical practice to guide personalized screening, diagnosis, and treatment. Here, we present a deep learning framework to capture pathogenic noncoding mutations in personal cancer genomes, which perturb gene regulation by altering chromatin architecture. We deployed the system specifically for localized prostate cancer by integrating large-scale prostate cancer genomes and the prostate-specific epigenome. We exhaustively evaluated somatic mutations in each patient's genome and agnostically identified thousands of somatic alleles altering the prostate epigenome. Functional genomic analyses subsequently demonstrated that affected genes displayed differential expression in prostate tumor samples, were vulnerable to expression alterations, and were convergent onto androgen receptor-mediated signaling pathways. Accumulation of pathogenic regulatory mutations in these affected genes was predictive of clinical observations, suggesting potential clinical utility of this approach. Overall, the deep learning framework has significantly expanded our view of somatic mutations in the vast noncoding genome, uncovered novel genes in localized prostate cancer, and will foster the development of personalized screening and therapeutic strategies for prostate cancer. SIGNIFICANCE: This study's characterization of the noncoding genome in prostate cancer reveals mutational signatures predictive of clinical observations, which may serve as a powerful prognostic tool in this disease.

  • Book Chapter
  • 10.1002/9780470015902.a0023262
Cancer Genome Sequencing
  • Dec 15, 2010
  • Chee Seng Ku + 3 more

The recent advances in high‐throughput sequencing technologies have enabled several whole cancer genomes to be sequenced. In addition, a number of large‐scale targeted resequencing studies have also been performed previously using Sanger sequencing methods. These studies have identified numerous somatic mutations in cancer genomes and provided new insights into the patterns of mutations in different cancer types. Several challenges remain in cancer genome sequencing such as accurately detecting different types of somatic mutations, the difficulty in identifying driver mutations, bioinformatics and analytical challenges in analysing the sequencing data and the cost of whole genome resequencing restricting the studies to a few genomes. However, cancer genome sequencing will eventually emerge as a routine tool to dissect the cancer genomes especially with the arrival of third generation sequencing technologies. The cancer genome resequencing studies have so far produced encouraging results to stimulate further studies to sequence more cancer genomes. These studies have made a significant contribution to the understanding of the somatic mutational profile of various cancers. Key Concepts: The genetic alterations of cancer occurring at the DNA sequence level can be classified as germline or somatic. Somatic mutations can occur in the cancer genome in several different forms such as single and double nucleotide variants or base substitutions, small insertion–deletions (indels) and larger structural chromosomal alterations. The recent advances in dissecting the somatic mutational profile of cancer genomes have been driven by high‐throughput or next‐generation sequencing (NGS) technologies which have enabled several whole cancer genomes to be sequenced for the first time. The involvement of somatic mutations in cancer initiation and progression, in addition to germline variations, is well recognised. Cancer genomes are characterised by their genomic instability which results in the occurrence of numerous somatic mutations which has proved challenging to investigate. Although a large number of somatic mutations have been detected in cancer genomes, only a small subset is predicted to be ‘driver’ mutations and the remainder considered ‘passenger’ mutations. Driver mutations are the mutations that initiate and drive oncogenesis steps, such as cell proliferation, tumour growth, angiogenesis, tissue invasion and metastasis. Several challenges remain in cancer genome sequencing such as to accurately detect different types of somatic mutations, the difficulty in identifying driver mutations, bioinformatics and analytical challenges and the cost for whole genome resequencing has restricted the studies to a few genomes. Currently there are no major obstacles in cataloging somatic mutations in cancer genomes. The real challenge lies in data interpretation and how the data can be used to discover new drugs or molecular markers for clinical applications. The ultimate goals of cancer genome sequencing are to improve the clinical management of patients and the creation of personalised medicine through the development of new therapeutic agents which are tailored to the individual based on their genetic information.

  • Research Article
  • Cite Count Icon 9
  • 10.1039/c0mb00211a
Finding co-mutated genes and candidate cancer genes in cancer genomes by stratified false discovery rate control
  • Jan 1, 2011
  • Molecular BioSystems
  • Jing Wang + 6 more

Finding candidate cancer genes playing causal roles in carcinogenesis is an important task in cancer research. The non-randomness of the co-mutation of genes in cancer samples can provide statistical evidence for these genes' involvement in carcinogenesis. It can also provide important information on the functional cooperation of gene mutations in cancer. However, due to the relatively small sample sizes used in current high-throughput somatic mutation screening studies and the extraordinary large-scale hypothesis tests, the statistical power of finding co-mutated gene pairs based on high-throughput somatic mutation data of cancer genomes is very low. Thus, we proposed a stratified FDR (False Discovery Rate) control approach, for identifying significantly co-mutated gene pairs according to the mutation frequency of genes. We then compared the identified co-mutated gene pairs separately by pre-selecting genes with higher mutation frequencies and by the stratified FDR control approach. Finally, we searched for pairs of pathways annotated with significantly more between-pathway co-mutated gene pairs to evaluate the functional roles of the identified co-mutated gene pairs. Based on two datasets of somatic mutations in cancer genomes, we demonstrated that, at a given FDR level, the power of finding co-mutated gene pairs could be increased by pre-selecting genes with higher mutation frequencies. However, many true co-mutation between genes with lower mutation rates will still be missed. By the stratified FDR control approach, many more co-mutated gene pairs could be found. Finally, the identified pathway pairs significantly overrepresented with between-pathway co-mutated gene pairs suggested that their co-dysregulations may play causal roles in carcinogenesis. The stratified FDR control strategy is efficient in identifying co-mutated gene pairs and the genes in the identified co-mutated gene pairs can be considered as candidate cancer genes because their non-random co-mutations in cancer genomes are highly unlikely to be attributable to chance.

  • Dissertation
  • 10.5167/uzh-164279
Characterization of cancer genomes through systematic analyses of oncogenomic data assemblies
  • Jan 1, 2013
  • Haoyang Cai

Cancer is the most common genetic disease in humans. It has been estimated that more than 10 million new cancer patients are detected worldwide each year. In the last decades, many efforts have been made by the research community to contribute to the fight against cancer. These works greatly expanded our understanding of the disease. However, the exact mechanisms of cancer initiation and progression remain elusive. The research on cancer genomes has focused on the identification of DNA sequence mutations and chromosomal rearrangements. Some of these somatic alterations can confer a growth advantage to cancer cells and promote cancer development. Mutated genes in cancer genomes can be potential new drug targets or serve as biomarkers for the improvement of diagnostics and therapy. Today, high-throughput genome-wide profiling technologies allow us to characterize the molecular profiles of cancer samples on various levels, including copy number alterations, gene expression, point mutations and epigenetic marks. Cancer research has gradually shifted from single experiments to large-scale “omics” data analysis approaches. It is an exciting, but challenging work. Our group aims to develop reliable and robust methods to characterize cancer genomes by analyzing large-scale oncogenomic datasets. During the last 4 years, I have focused my efforts on using systems biology and statistical methods to model and annotate genomic array data in human cancer. My research is based on a data collection and re-analysis project that generates very large amounts of microarray data. Computational biology approaches were applied on this dataset for data mining. We collected more than 40000 arrays, including comparative genomic hybridization (CGH) and SNP (single nucleotide polymorphism) arrays, from several public databases. A pipeline was developed to process raw data and determine copy number aberrations (CNAs). All data was converted to a unified and structured format, and stored in our arrayMap database, together with available clinical information. We also set up an online website for providing this resource to the research community. Based on the large-scale CNA data in our database, the second project aimed to explore the correlation between CNAs and local gene density across cancer genomes. Through a genome binning method, I found that focal CNAs are significantly enriched in gene-rich regions. In addition, this positive correlation is not only driven by cancer genes. Since this result is derived from more than 16000 cancer samples, it provides a global insight into the relationship between cancer genome instability and structure from a new perspective. The enrichment reveals that there may be a non-neutral selection pressure for CNA regions across the genome. The observed significant positive correlation in this project may enable a better elucidation of mechanisms by which CNAs contribute to tumor development, and promote a more systematic understanding of cancer. The third project presented here is related to a new phenomenon, termed “chromothripsis”, found in cancer development. In this type of events, contiguous chromosomal regions are fragmented into many pieces, and the cell’s DNA repair machinery randomly fuses these segments together to rescue the genome. This is quite different from the classical step-by- step model of cancer development. We developed an algorithm based on scan statistics to automatically detect chromothripsis-like patterns, and identify both size and location of the involved regions. From our input of 22,347 high quality arrays, we identified 918 chromothripsis cases, representing 132 cancer types. The results from this dataset provide several new insights regarding the distribution of chromothripsis-like patterns and a comprehensive estimation of chromothripsis incidence in a large range of cancer entities. Importantly, our work partly overcomes the limitation of individual research projects resulting from the relatively low incidence of chromothripsis in cancer samples available. An investigation into the affected chromosomal regions supports breakage-fusion-bridge cycles as one of the potential underlying mechanisms. Finally, we evaluated the clinical associations of chromothripsis and found that this event may be associated with a poor outcome. The observed chromothripsis events in our project may reflect on heterogenous biological phenomena, and probably vary in their specific impact on oncogenesis. Taken together, the results presented in this thesis characterize the cancer genome by large-scale oncogenomic array data, and further elucidate the potential mechanisms underlying cancer development.

  • Research Article
  • Cite Count Icon 24
  • 10.1038/ejhg.2016.129
Improving the in silico assessment of pathogenicity for compensated variants.
  • Oct 5, 2016
  • European Journal of Human Genetics
  • Luisa Azevedo + 6 more

Understanding the functional sequelae of amino-acid replacements is of fundamental importance in medical genetics. Perhaps, the most intuitive way to assess the potential pathogenicity of a given human missense variant is by measuring the degree of evolutionary conservation of the substituted amino-acid residue, a feature that generally serves as a good proxy metric for the functional/structural importance of that residue. However, the presence of putatively compensated variants as the wild-type alleles in orthologous proteins of other mammalian species not only challenges this classical view of amino-acid essentiality but also precludes the accurate evaluation of the functional impact of this type of missense variant using currently available bioinformatic prediction tools. Compensated variants constitute at least 4% of all known missense variants causing human-inherited disease and hence represent an important potential source of error in that they are likely to be disproportionately misclassified as benign variants. The consequent under-reporting of compensated variants is exacerbated in the context of next-generation sequencing where their inappropriate exclusion constitutes an unfortunate natural consequence of the filtering and prioritization of the very large number of variants generated. Here we demonstrate the reduced performance of currently available pathogenicity prediction tools when applied to compensated variants and propose an alternative machine-learning approach to assess likely pathogenicity for this particular type of variant.

  • Research Article
  • Cite Count Icon 362
  • 10.1093/emboj/17.13.3556
Yeast PKA represses Msn2p/Msn4p-dependent gene expression to regulate growth, stress response and glycogen accumulation.
  • Jul 1, 1998
  • The EMBO Journal
  • A Smith

Yeast cAMP-dependent protein kinase (PKA) activity is essential for growth and antagonizes induction of the general stress response as well as accumulation of glycogen stores. Previous studies have suggested that the PKA effects on the two latter processes result in part from transcription repression. Here we show that transcription derepression that accompanies PKA depletion is dependent upon the presence of two redundant Zn2+-finger transcription factors, Msn2p and Msn4p. The Msn2p and Msn4p proteins were shown previously to act as positive transcriptional factors in the stress response pathway, and our results suggest that Msn2p and Msn4p also mediate PKA-dependent effects on stress response as well as glycogen accumulation genes. Interestingly, PKA activity is dispensable in a strain lacking Msn2p and Msn4p activity. Thus, Msn2p and Msn4p may antagonize PKAdependent growth by stimulating expression of genes that inhibit growth. In agreement with this model, Msn2p/Msn4p function is required for expression of a gene, YAK1, previously shown to antagonize PKA-dependent growth. These results suggest that Msn2p/Msn4p-dependent gene expression may account for all, or at least most, of the pleiotropic effects of yeast PKA, including growth regulation, response to stress and carbohydrate store accumulation.

  • Research Article
  • Cite Count Icon 213
  • 10.1038/nature17437
Differential DNA repair underlies mutation hotspots at active promoters in cancer genomes.
  • Apr 13, 2016
  • Nature
  • Dilmi Perera + 5 more

Promoters are DNA sequences that have an essential role in controlling gene expression. While recent whole cancer genome analyses have identified numerous hotspots of somatic point mutations within promoters, many have not yet been shown to perturb gene expression or drive cancer development. As such, positive selection alone may not adequately explain the frequency of promoter point mutations in cancer genomes. Here we show that increased mutation density at gene promoters can be linked to promoter activity and differential nucleotide excision repair (NER). By analysing 1,161 human cancer genomes across 14 cancer types, we find evidence for increased local density of somatic point mutations within the centres of DNase I-hypersensitive sites (DHSs) in gene promoters. Mutated DHSs were strongly associated with transcription initiation activity, in which active promoters but not enhancers of equal DNase I hypersensitivity were most mutated relative to their flanking regions. Notably, analysis of genome-wide maps of NER shows that NER is impaired within the DHS centre of active gene promoters, while XPC-deficient skin cancers do not show increased promoter mutation density, pinpointing differential NER as the underlying cause of these mutation hotspots. Consistent with this finding, we observe that melanomas with an ultraviolet-induced DNA damage mutation signature show greatest enrichment of promoter mutations, whereas cancers that are not highly dependent on NER, such as colon cancer, show no sign of such enrichment. Taken together, our analysis has uncovered the presence of a previously unknown mechanism linking transcription initiation and NER as a major contributor of somatic point mutation hotspots at active gene promoters in cancer genomes.

  • Research Article
  • 10.1158/0008-5472.fbcr09-a23
Abstract A23: An integrated platform for the functional annotation of the cancer genome
  • Dec 1, 2009
  • Cancer Research
  • Jesse S Boehm + 21 more

The systematic characterization of mutations in cancer genomes through efforts such as The Cancer Genome Atlas will lead to a comprehensive list of alterations associated with particular cancers. A powerful complementary approach is to comprehensively characterize the functional basis of cancer, by identifying the genes essential for growth and related phenotypes in different cancer cells. Such information would be particularly valuable for identifying potential drug targets. The recent development of an efficient, robust approach to perform genome-scale pooled shRNA screens now permits the highly parallel identification of essential genes in cancer cells in a cost-effective manner. We have initiated a project to identify essential genes in 300 cancer cell lines representing a diverse range of lineages and genotypes. In each screen the abundance of 55,000 shRNA constructs targeting 11,000 genes is monitored in quadruplicate at the completion of 16 population doublings via hybridization of half-hairpin barcodes to a custom Affymetrix microarray. We have developed multiple complementary approaches for the analysis of this screening data at the shRNA level and at the gene level. shRNA level analytical tools include signal to noise and fold depletion metrics to identify individual shRNA constructs whose abundance at the completion of the experiment discriminates two classes of cell lines (e.g., KRASmut vs. KRASwt). Gene level analytical tools include RIGER, a gene-set enrichment analysis (GSEA)-based non-parametric algorithm which treats the 5 shRNA constructs targeting a given gene as a set and assesses bias of each gene-shRNA set as showing evidence of depletion during the experiment. Using these tools, we have begun to systematically identify known and novel anti-cancer drug targets via the integration of these functional screening results with corresponding structural cancer genomic data derived from both the screened cell lines and from known alterations in tumor samples. To facilitate this analysis, each of the screened cell lines has undergone comprehensive molecular characterization (DNA copy number, RNA expression, OncoMap high-throughput mutation profiling) to identify the genomic alterations harbored in its genome. Our preliminary data suggests that this integrated approach is efficient at pinpointing molecular targets that not only include genes altered in cancer genomes but additionally include genes exhibiting a synthetic lethal relationship with an oncogenic driver mutation (e.g., KRAS).We are validating candidate molecular targets using both loss-of-function and gain-of-function secondary screens. To facilitate these gain-of-function screens, we are creating a library of human open reading frames (ORFs) by sequencing and transferring the Human ORFeome collection, developed by the Center for Cancer Systems Biology at the Dana-Farber Cancer Institute, from Gateway Entry vectors into lentiviral expression vectors. This integrated platform for the unbiased, systematic functional annotation of the cancer genome represents an opportunity to identify molecular targets at genome-scale. Citation Information: Cancer Res 2009;69(23 Suppl):A23.

  • Preprint Article
  • 10.1158/1535-7163.c.6532013.v1
Data from Systematic Interpretation of Comutated Genes in Large-Scale Cancer Mutation Profiles
  • Mar 31, 2023
  • Zheng Guo + 11 more

<div>Abstract<p>By high-throughput screens of somatic mutations of genes in cancer genomes, hundreds of cancer genes are being rapidly identified, providing us abundant information for systematically deciphering the genetic changes underlying cancer mechanism. However, the functional collaboration of mutated genes is often neglected in current studies. Here, using four genome-wide somatic mutation data sets and pathways defined in various databases, we showed that gene pairs significantly comutated in cancer samples tend to distribute between pathways rather than within pathways. At the basic functional level of motifs in the human protein-protein interaction network, we also found that comutated gene pairs were overrepresented between motifs but extremely depleted within motifs. Specifically, we showed that based on Gene Ontology that describes gene functions at various specific levels, we could tackle the pathway definition problem to some degree and study the functional collaboration of gene mutations in cancer genomes more efficiently. Then, by defining pairs of pathways frequently linked by comutated gene pairs as the between-pathway models, we showed they are also likely to be codisrupted by mutations of the interpathway hubs of the coupled pathways, suggesting new hints for understanding the heterogeneous mechanisms of cancers. Finally, we showed some between-pathway models consisting of important pathways such as cell cycle checkpoint and cell proliferation were codisrupted in most cancer samples under this study, suggesting that their codisruptions might be functionally essential in inducing these cancers. All together, our results would provide a channel to detangle the complex collaboration of the molecular processes underlying cancer mechanism. Mol Cancer Ther; 9(8); 2186–95. ©2010 AACR.</p></div>

  • Preprint Article
  • 10.1158/1535-7163.c.6532013
Data from Systematic Interpretation of Comutated Genes in Large-Scale Cancer Mutation Profiles
  • Mar 31, 2023
  • Da Yang + 11 more

<div>Abstract<p>By high-throughput screens of somatic mutations of genes in cancer genomes, hundreds of cancer genes are being rapidly identified, providing us abundant information for systematically deciphering the genetic changes underlying cancer mechanism. However, the functional collaboration of mutated genes is often neglected in current studies. Here, using four genome-wide somatic mutation data sets and pathways defined in various databases, we showed that gene pairs significantly comutated in cancer samples tend to distribute between pathways rather than within pathways. At the basic functional level of motifs in the human protein-protein interaction network, we also found that comutated gene pairs were overrepresented between motifs but extremely depleted within motifs. Specifically, we showed that based on Gene Ontology that describes gene functions at various specific levels, we could tackle the pathway definition problem to some degree and study the functional collaboration of gene mutations in cancer genomes more efficiently. Then, by defining pairs of pathways frequently linked by comutated gene pairs as the between-pathway models, we showed they are also likely to be codisrupted by mutations of the interpathway hubs of the coupled pathways, suggesting new hints for understanding the heterogeneous mechanisms of cancers. Finally, we showed some between-pathway models consisting of important pathways such as cell cycle checkpoint and cell proliferation were codisrupted in most cancer samples under this study, suggesting that their codisruptions might be functionally essential in inducing these cancers. All together, our results would provide a channel to detangle the complex collaboration of the molecular processes underlying cancer mechanism. Mol Cancer Ther; 9(8); 2186–95. ©2010 AACR.</p></div>

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon