Abstract
The choice of an appropriate variant calling pipeline for exome sequencing data is becoming increasingly more important in translational medicine projects and clinical contexts. Within GOSgene, which facilitates genetic analysis as part of a joint effort of the University College London and the Great Ormond Street Hospital, we aimed to optimize a variant calling pipeline suitable for our clinical context. We implemented the GATK/Queue framework and evaluated the performance of its two callers: the classical UnifiedGenotyper and the new variant discovery tool HaplotypeCaller. We performed an experimental validation of the loss-of-function (LoF) variants called by the two methods using Sequenom technology. UnifiedGenotyper showed a total validation rate of 97.6% for LoF single-nucleotide polymorphisms (SNPs) and 92.0% for insertions or deletions (INDELs), whereas HaplotypeCaller was 91.7% for SNPs and 55.9% for INDELs. We confirm that GATK/Queue is a reliable pipeline in translational medicine and clinical context. We conclude that in our working environment, UnifiedGenotyper is the caller of choice, being an accurate method, with a high validation rate of error-prone calls like LoF variants. We finally highlight the importance of experimental validation, especially for INDELs, as part of a standard pipeline in clinical environments.
Highlights
While exome sequencing is becoming a more widely accessible and available tool in the context of translational medicine research and in clinical diagnosis (Need et al 2012), the choice of an accurate and reliable pipeline is of fundamental importance
Combining data obtained with the two approaches, we identified 25,516 novel single-nucleotide polymorphisms (SNPs) and 9144 novel INDELs (Fig. 1 and S1)
170 novel LoF SNPs and 269 novel LoF INDELs were identified by both callers, 53 SNPs and 108 INDELs by UnifiedGenotyper only, 18 SNPs and 228 INDELs by HaplotypeCaller only (Fig. 1)
Summary
While exome sequencing is becoming a more widely accessible and available tool in the context of translational medicine research and in clinical diagnosis (Need et al 2012), the choice of an accurate and reliable pipeline is of fundamental importance. The clinical environment has additional pressure to reduce the number of false-positive variant calls, while keeping the sensitivity as high as possible (Ku et al 2011; Flannick et al 2012). Recent studies have highlighted an increased bias toward false-positive calls among loss-of-function (LoF) variants (Macarthur and Tyler-Smith 2010; Macarthur et al 2012), that is, those polymorphisms likely to be more interesting from a functional point of view, and relevant when analyzing rare diseases with familial inheritance patterns. While single-nucleotide variants (SNVs) are straightforward to annotate and validate, additional effort is needed for insertion/deletions (INDELs) (Lescai et al 2012). We welcome improved methods for their accurate identification
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.