Abstract

16GT is a variant caller for Illumina whole-genome and whole-exome sequencing data. It uses a new 16-genotype probabilistic model to unify single nucleotide polymorphism and insertion and deletion calling in a single variant calling algorithm. In benchmark comparisons with 5 other widely used variant callers on a modern 36-core server, 16GT demonstrated improved sensitivity in calling single nucleotide polymorphisms, and it provided comparable sensitivity and accuracy for calling insertions and deletions as compared to the GATK HaplotypeCaller. 16GT is available at https://github.com/aquaskyline/16GT.

Highlights

  • Single nucleotide polymorphisms (SNPs) and insertions and deletions that occur at a specific genome position are interdependent; i.e., evidence that elevates the probability of one variant type should decrease the probability of other possible variant types, and the probability of all possible alleles should sum up to 1

  • In order to detect SNPs and indels with a unified approach, we developed a new 16-genotype probabilistic model and its implementation named 16GT

  • We benchmarked 16GT with GATK UnifiedGenotyper, GATK HaplotypeCaller (McKenna, et al, 2010), Freebayes (Garrison and Marth, 2012), Fermikit (Li, 2015) and ISAAC (Raczy, et al, 2013) using a set of very high-confidence variants developed by the Genome-in-a-bottle (GIAB) project for genome NA12878 (Zook, et al, 2014)

Read more

Summary

Introduction

Single nucleotide polymorphisms (SNPs) and insertions and deletions (indels) that occur at a specific genome position are interdependent; i.e., evidence that elevates the probability of one variant type should decrease the probability of other possible variant types, and the probability of all possible alleles should sum up to 1. Widely-used tools such as GATK's UnifiedGenotyper (McKenna, et al, 2010) and SAMtools (Li, et al, 2009) use separate models for SNP and indel detection. The model for SNP calling in these two tools is nearly identical: both assume all variants are biallelic and use a probabilistic model allowing for 10 genotypes (AA, AC, AG, AT, CC, CG, CT, GG, GT, TT). The GATK UnifiedGenotyper uses a model from Dindel's model (Albers, et al, 2011), while SAMtools’ model is from BAQ (Li, 2011)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call