Abstract

The fight against cancer is hindered by its highly heterogeneous nature. Genome-wide sequencing studies have shown that individual malignancies contain many mutations that range from those commonly found in tumor genomes to rare somatic variants present only in a small fraction of lesions. Such rare somatic variants dominate the landscape of genomic mutations in cancer, yet efforts to correlate somatic mutations found in one or few individuals with functional roles have been largely unsuccessful. Traditional methods for identifying somatic variants that drive cancer are ‘gene-centric’ in that they consider only somatic variants within a particular gene and make no comparison to other similar genes in the same family that may play a similar role in cancer. In this work, we present oncodomain hotspots, a new ‘domain-centric’ method for identifying clusters of somatic mutations across entire gene families using protein domain models. Our analysis confirms that our approach creates a framework for leveraging structural and functional information encapsulated by protein domains into the analysis of somatic variants in cancer, enabling the assessment of even rare somatic variants by comparison to similar genes. Our results reveal a vast landscape of somatic variants that act at the level of domain families altering pathways known to be involved with cancer such as protein phosphorylation, signaling, gene regulation, and cell metabolism. Due to oncodomain hotspots’ unique ability to assess rare variants, we expect our method to become an important tool for the analysis of sequenced tumor genomes, complementing existing methods.

Highlights

  • In recent years, studies analyzing sequenced tumor genomes have seen a drastic increase in their sample sizes, growing from only a handful samples to cohorts of several thousand patients

  • Even in the early years of such gene-centric data-driven analyses of sequenced tumor genomes like the CaMP Score, it was discovered that the genomic landscapes of somatic mutations in cancer were dominated by ‘gene hills’, or gene regions that are mutated at a low frequency

  • We demonstrate that oncodomain hotspots overlap well with the cancer genomics literature and the results of both gene- and domain-centric methods, and that our method is unique in the ability to detect variants that occur with low frequency in tumor samples but have evidence of cancer involvement or are predicted to be driver mutations by Cancer-specific High-throughput Annotation of Somatic Mutations (CHASM)

Read more

Summary

Introduction

Studies analyzing sequenced tumor genomes have seen a drastic increase in their sample sizes, growing from only a handful samples to cohorts of several thousand patients. Massive ongoing sequencing projects like The Cancer Genome Atlas (TCGA) have discovered thousands of genes that are mutated in only a small fraction of tumors yet may still be important for cancer initiation or progression [7,16,17,18] This has led to a rise in the availability of tools for analyzing and visualizing data [19,20,21,22,23] and for identifying genes in tumor samples that are likely to harbor somatic variants that drive cancer initiation or progression [1,2,24,25]. The Cancer-specific High-throughput Annotation of Somatic Mutations (CHASM) [34], is a machine learning predictor trained to classify between variants known to drive cancer progression and putatively neutral variants using properties of genomic and protein sequence, predicted protein structure, and multiple sequence alignments

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.