Abstract

Whole exome sequencing (WES)-based assays undergo rigorous validation before being implemented in diagnostic laboratories. This validation process generates experimental evidence that allows laboratories to predict the performance of the intended assay. The NA12878 Genome in a Bottle (GIAB) HapMap reference sample is commonly used for validation in diagnostic laboratories. We investigated what data points should be taken into consideration when validating WES-based assays using the GIAB reference in a diagnostic setting. We delineate specific factors that require special consideration and identify OMIM genes associated with diseases that may 'bypass' validation. Four replicates of the NA12878 sample were sequenced at the CHEO Genetics Diagnostic Laboratory on a NextSeq 500; the data were analyzed using the bcbio_nexgen v1.1.2 pipeline. The hap.py validation engine, Real Time Genomics vcfeval tool, and high confidence (HC) variant calls in HC regions available for the GIAB sample were used to validate the obtained variant calls. The same validation process was then used to evaluate variant calls obtained for the same sample by two other clinical diagnostic laboratories. We showed that variant calls in NA12878 can be confidently measured only in the regions that intersect between the GIAB HC regions and the target regions of exome capture. Of the 4139 (as of October 2019) OMIM genes associated with a phenotype and having a known molecular basis of disease, 84 were fully outside of the GIAB HC regions and many of the remaining OMIM genes were only partially covered by the HC regions. A significant proportion of variants identified in the NA12878 sample outside of the HC regions have unknown (UNK) status due to the absence of HC reference alleles. Verification of such calls is possible either by an alternative truth set or by orthogonal testing. Similarly, many variants outside of exome capture regions, if not accounted for, will be deemed false negatives due to insufficient probe coverage. Our results demonstrate the importance of the intersection between genomic regions of interest, capture regions, and the high confidence regions. If not considered, false and ambiguous variant calls could have a negative impact on diagnostic accuracy of the intended WES-based diagnostic assay and increase the need for confirmatory testing. To enable laboratories to identify 'problematic' regions and optimize validation efforts, we have made our VCF and BED files available in UCSC Genome Browser: NA12878 WES Benchmark. Relevant genes and genome annotations are evolving, we implemented a general purpose algorithm to cross-reference OMIM genes with the genomic regions of interest that can be applied to capture genes/regions outside HC regions (see repository of data material section).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call