Abstract

Genomics and genome screening are proving central to the study of cancer. However, a good appreciation of the protein structures coded by cancer genes is also invaluable, especially for the understanding of functions, for assessing ligandability of potential targets, and for designing new drugs. To complement the wealth of information on the genetics of cancer in COSMIC, the most comprehensive database for cancer somatic mutations available, structural information obtained experimentally has been brought together recently in COSMIC-3D. Even where structural information is available for a gene in the Cancer Gene Census, a list of genes in COSMIC with substantial evidence supporting their impacts in cancer, this information is quite often for a single domain in a larger protein or for a single protomer in a multiprotein assembly. Here, we show that over 60% of the genes included in the Cancer Gene Census are predicted to possess multiple domains. Many are also multicomponent and membrane-associated molecular assemblies, with mutations recorded in COSMIC affecting such assemblies. However, only 469 of the gene products have a structure represented in the PDB, and of these only 87 structures have 90–100% coverage over the sequence and 69 have less than 10% coverage. As a first step to bridging gaps in our knowledge in the many cases where individual protein structures and domains are lacking, we discuss our attempts of protein structure modelling using our pipeline and investigating the effects of mutations using two of our in-house methods (SDM2 and mCSM) and identifying potential driver mutations. This allows us to begin to understand the effects of mutations not only on protein stability but also on protein-protein, protein-ligand and protein-nucleic acid interactions. In addition, we consider ways to combine the structural information with the wealth of mutation data available in COSMIC. We discuss the impacts of COSMIC missense mutations on protein structure in order to identify and assess the molecular consequences of cancer-driving mutations.

Highlights

  • Cancer is one of the most common diseases afflicting humanity today and the second leading cause of death globally (WHO Key Facts, Feb 2018)

  • Of the 719 genes included in the Cancer Gene Census, 205 genes are single domain and 476 genes are predicted to be multidomain, leaving 38 genes with no PFam domain predicted using HMMER3 (Fig 1A)

  • We focused on the missense mutations[56], for which there are 482 unique mutations reported in COSMIC for androgen receptor (AR), 221 unique mutations from 789 samples were mapped to the AR LBD, DNA-binding domain and linker between the two (Fig 5B, shown as purple)

Read more

Summary

Introduction

Cancer is one of the most common diseases afflicting humanity today and the second leading cause of death globally (WHO Key Facts, Feb 2018). Cancer refers to any genetic disease that leads to an uncontrolled proliferation, causing a tumor. We have a good description of mutations that recur in common cancers, defining the structures of the gene products, which is important for predicting the impacts of most mutations, is much more challenging and expensive. This leads to a gap in our understanding of how the sequence data relate to the structure and function of the protein

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call