Abstract

Abstract Cancer testing is undergoing a revolution. While small gene panels have previously dominated the landscape, whole exome (WES) based strategies have recently emerged as valuable tools for identifying mutations with clinical significance and are being rapidly adopted into routine clinical cancer care. Assessing the significance and actionability of these variants is largely dependent on the growth of public repositories of germline and somatic variants, such as ExAC, COSMIC, TCGA and ClinVar. However, efforts to integrate data from these various sources have revealed numerous challenges. One of these challenges is the generation and use of variant syntax in a standardized, unambiguous format, which is paramount for variant search for both the clinical and scientific community. Variants can be represented in many different ways: in the VCF file format, which is the de facto standard for sequencing data; in genomic or transcript-based coordinates using HGVS nomenclature; as amino acid alterations in three- or single-letter codes according to transcript and protein definitions that can vary based on the database used. Analyses of dbSNP and COSMIC identified at least 350,000 and 27,000 variants respectively that, without “normalization”, have ambiguous VCF representations (using software described by Tan et al., 2015). This indicates that one-to-one searches and annotations in VCF files may not identify an exact match in the absence of normalization. At the coding and protein level, variants are typically reported according to syntax recommendations by the Human Genome Variation Society (HGVS). In evaluating a number of tools for generating HGVS nomenclature (snpEff, VEP, and Variation Reporter), we found challenges in reconciling syntax representation across these tools and databases. We demonstrate that variant annotation is dependent on both transcript and version, complicating comparisons between NCBI and Ensembl-based systems, such as ClinVar (NCBI) and COSMIC (Ensembl). Even given the same transcript, variants can be represented differently. Over 20% of variants output by the tools in our comparison reported different nomenclature for the same exact variant reported by ClinVar. Using a manually curated ‘gold truth’ set of variants, we found that as many as 75% of non-missense variants are called incorrectly by these tools. The results of our tests have significant implications for the search and annotation of variants during cancer analyses and interpretation, and serve to inform the ongoing adoption and refinement of available resources. Citation Format: Jennifer Yen, Sarah Garcia, Michael Clark, Steve Chervitz, Brian Linebaugh, Aldrin Montana, John West, Richard Chen, Deanna Church. Challenges in variant searching and annotation for clinical cancer testing. [abstract]. In: Proceedings of the 107th Annual Meeting of the American Association for Cancer Research; 2016 Apr 16-20; New Orleans, LA. Philadelphia (PA): AACR; Cancer Res 2016;76(14 Suppl):Abstract nr 3612.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.