Abstract

The Cancer Genome Atlas (TCGA) provides a genetic characterization of more than ten thousand tumors, enabling the discovery of novel driver mutations, molecular subtypes, and enticing drug targets across many histologies. Here we investigated why some mutations are common in particular cancer types but absent in others. As an example, we observed that the gene CCDC168 has no mutations in the stomach adenocarcinoma (STAD) cohort despite its common presence in other tumor types. Surprisingly, we found that the lack of called mutations was due to a systematic insufficiency in the number of sequencing reads in the STAD and other cohorts, as opposed to differential driver biology. Using strict filtering criteria, we found similar behavior in four other genes across TCGA cohorts, with each gene exhibiting systematic sequencing depth issues affecting the ability to call mutations. We identified the culprit as the choice of exome capture kit, as kit choice was highly associated with the set of genes that have insufficient reads to call a mutation. Overall, we found that thousands of samples across all cohorts are subject to some capture kit problems. For example, for the 6353 samples using the Broad Institute’s Custom capture kit there are undercalling biases for at least 4833 genes. False negative mutation calls at these genes may obscure biological similarities between tumor types and other important cancer driver effects in TCGA datasets.

Highlights

  • The Cancer Genome Atlas (TCGA) has been a valuable resource for shining light on tumor genetic and molecular biology, allowing for the move towards targeted therapy oncology clinical trials like NCI’s MATCH [1]

  • We have investigated whether CCDC168 mutations and other TCGA mutations are impacted by measurement bias by considering features in each cancer sample associated with a failure to call mutations

  • We have demonstrated that choice of exon capture kit systematically impacts mutation calling in a cohort-dependent manner, and in particular we considered five genes as case studies that were repeatedly uncalled across diverse cohorts even at stringent filtering criteria

Read more

Summary

Introduction

The Cancer Genome Atlas (TCGA) has been a valuable resource for shining light on tumor genetic and molecular biology, allowing for the move towards targeted therapy oncology clinical trials like NCI’s MATCH [1]. One of TCGA’s many strengths is the coverage and depth of their whole-exome sequencing (WES) protocol; the average of approximately 100x coverage [2] has been used to confidently call mutations even at allele frequencies of 0.2 or below using MuTect [3]. This mutation calling power has enabled important translational research such as identifying targetable driver mutations [4].

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.