Abstract Two large cancer genomic consortia recently published the largest and highest-quality consensus mutations calls for both whole-exome sequencing (WES) and whole-genome sequencing (WGS) in cancer: The Cancer Genome Atlas (TCGA), and the International Cancer Genetics Consortia (ICGC), respectively. Together these datasets encompass more than 60M mutations from ~13,000 samples (~10,000 WES and ~3,000 WGS). An intersecting set of 742 samples, from 22 cancer types, was sequenced using both platforms and mutations were identified using a combined 13 variant calling tools (7 WES and 5 WGS). These samples represent an ideal dataset to compare and contrast WES with WGS performance, reliability, and reproducibility of mutation calling in exons, and provide the community with key regions flanking exons that play a role in carcinogenesis. MAF files were collected using strict filtering criteria for initial file release, including the elimination of germline contaminants, 8-oxo-guanine artifacts, depth filtering and repeat masking. Additional filtering included minimum coverage requirements and restriction of both WES and WGS to variants detected within targeted exons. Finally, we restricted our data to known cancer genes. This final step suggests that these 742 samples have anywhere between 11.5K to 12.3K mutations from covered exons in potential cancer driver genes—WES and WGS, respectively. Preliminary results found that ~70% of samples had had >80% congruent mutations between both platforms; ~25% of samples had had >80% congruent mutations calls in one or the other platform; and the remaining samples had poor performance in replicating identical mutations. We observed that a majority of the variants unique to a sequencing platform were primarily from mutations with low VAF. We also sought to explore regions of the genome that are captured by both technologies despite the knowledge that WES did not target these regions. This is made possible by obtaining access to the primary data resources, and relaxing filtering criteria to include other regions such as 3' and 5' UTR, exon flanking regions, and intronic regions. We identified many recurrent mutations from non-exonic regions that were corroborated using both platforms that have not been previously reported in pan-cancer efforts. At this historic junction in time, as preliminary results from whole-genome sequencing efforts emerge and large exome sequencing efforts taper, 742 samples spanning both efforts can provide insights into the lessons learned from exome sequencing, and provide a solid foundation stepping forward into whole-genome analysis. We will continue to glean insights into the etiology of human disease by using both technologies; however, these mutation calls highlight the challenges that still exist in somatic variant calling, and provide grounds for more critical evaluation of genomic findings in cancer. Citation Format: Matthew H. Bailey, Liang-Bo Wang, Wen-Wei Liang, Steven Foltz, Guanlan Dong, Michael C. Wendl, Michael McLellan, Angela C. Hirbe, Jared Simpson, Mark Gerstein, Li Ding. Reproducibility assessment of mutations calls in exome- and whole-genome sequencing using consensus calling from TCGA and ICGC [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr 419.
Read full abstract