Abstract
The success of clinical genomics using next generation sequencing (NGS) requires the accurate and consistent identification of personal genome variants. Assorted variant calling methods have been developed, which show low concordance between their calls. Hence, a systematic comparison of the variant callers could give important guidance to NGS-based clinical genomics. Recently, a set of high-confident variant calls for one individual (NA12878) has been published by the Genome in a Bottle (GIAB) consortium, enabling performance benchmarking of different variant calling pipelines. Based on the gold standard reference variant calls from GIAB, we compared the performance of thirteen variant calling pipelines, testing combinations of three read aligners—BWA-MEM, Bowtie2, and Novoalign—and four variant callers—Genome Analysis Tool Kit HaplotypeCaller (GATK-HC), Samtools mpileup, Freebayes and Ion Proton Variant Caller (TVC), for twelve data sets for the NA12878 genome sequenced by different platforms including Illumina2000, Illumina2500, and Ion Proton, with various exome capture systems and exome coverage. We observed different biases toward specific types of SNP genotyping errors by the different variant callers. The results of our study provide useful guidelines for reliable variant identification from deep sequencing of personal genomes.
Highlights
The success of clinical genomics using generation sequencing (NGS) requires the accurate and consistent identification of personal genome variants
We could analyze only one data set for Ion Proton, as only a single Ion Proton sequence data set for NA12878 was available by the time of this study
This affordability of next generation sequencing (NGS) allows the clinical application of whole exome sequencing (WES) or whole genome sequencing (WGS) to identify variants of personal genomes for practicing genomic medicine
Summary
The success of clinical genomics using generation sequencing (NGS) requires the accurate and consistent identification of personal genome variants. Technologies, seven read mappers and three variant calling methods, and manually arbitrated between discordant data sets They provided more highly confident calls and regions by integration of the version v2.19 GIAB calls, genomic information of pedigrees of NA12878, and Illumina Platinum project variant calls. This highly accurate and presumably mostly unbiased set of SNP and indel genotype calls for NA12878 is the only gold standard variant genotype data set publicly available for systematic comparisons of variant callers
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.