Abstract

BackgroundInsertions/deletions (indels) are the second most common type of genomic variant and the most common type of structural variant. Identification of indels in next generation sequencing data is a challenge, and algorithms commonly used for indel detection have not been compared on a research cohort of human subject genomic data. Guidelines for the optimal detection of biologically significant indels are limited. We analyzed three sets of human next generation sequencing data (48 samples of a 200 gene target exon sequencing, 45 samples of whole exome sequencing, and 2 samples of whole genome sequencing) using three algorithms for indel detection (Pindel, Genome Analysis Tool Kit's UnifiedGenotyper and HaplotypeCaller).ResultsWe observed variation in indel calls across the three algorithms. The intersection of the three tools comprised only 5.70% of targeted exon, 19.52% of whole exome, and 14.25% of whole genome indel calls. The majority of the discordant indels were of lower read depth and likely to be false positives. When software parameters were kept consistent across the three targets, HaplotypeCaller produced the most reliable results. Pindel results did not validate well without adjustments to parameters to account for varied read depth and number of samples per run. Adjustments to Pindel's M (minimum support for event) parameter improved both concordance and validation rates. Pindel was able to identify large deletions that surpassed the length capabilities of the GATK algorithms.ConclusionsDespite the observed variability in indel identification, we discerned strengths among the individual algorithms on specific data sets. This allowed us to suggest best practices for indel calling. Pindel's low validation rate of indel calls made in targeted exon sequencing suggests that HaplotypeCaller is better suited for short indels and multi-sample runs in targets with very high read depth. Pindel allows for optimization of minimum support for events and is best used for detection of larger indels at lower read depths.Electronic supplementary materialThe online version of this article (doi:10.1186/1756-0500-7-864) contains supplementary material, which is available to authorized users.

Highlights

  • Insertions/deletions are the second most common type of genomic variant and the most common type of structural variant

  • Characteristics of indels called Pindel made significantly more (p < 0.0005) indel calls in the TES and the WES samples than either of the Genome Analysis Toolkit (GATK) tools

  • Pindel called 49 indels per sample in the TES and 847.6 indels per sample in the WES compared to 3.92 and 3.73 indel calls per sample in the TES and 435 and 321 indel calls per sample in the WES data made by UnifiedGenotyper and HaplotypeCaller respectively

Read more

Summary

Introduction

Insertions/deletions (indels) are the second most common type of genomic variant and the most common type of structural variant. We analyzed three sets of human generation sequencing data (48 samples of a 200 gene target exon sequencing, 45 samples of whole exome sequencing, and 2 samples of whole genome sequencing) using three algorithms for indel detection (Pindel, Genome Analysis Tool Kit's UnifiedGenotyper and HaplotypeCaller). Indels are the second most common type of genomic variant and the most common type of structural variant [1] with an expected ~1.6 million collective indel polymorphisms in the human population [2]. We have run a comparison of commonly used indel detection tools: Pindel, the Genome Analysis Toolkit's (GATK) UnifiedGenotyper, and GATK's HaplotypeCaller on a diverse set of targets in human NGS data. One study reported a 92% validation rate using UnifiedGenotyper for indel detection in human whole exome data [14]. Another study using simulated short indels reported high positive predictive value for UnifiedGenotyper, but decreased sensitivity at lower read depths (< 10x) [15]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call