Abstract

Insertions and deletions (indels) have been implicated in dozens of human diseases through the radical alteration of gene function by short frameshift indels as well as long indels. However, the accurate detection of these indels from next-generation sequencing data is still challenging. This is particularly true for intermediate-size indels (≥50 bp), due to the short DNA sequencing reads. Here, we developed a new method that predicts intermediate-size indels using BWA soft-clipped fragments (unmatched fragments in partially mapped reads) and unmapped reads. We report the performance comparison of our method, GATK, PINDEL and ScanIndel, using whole exome sequencing data from the same samples. False positive and false negative counts were determined through Sanger sequencing of all predicted indels across these four methods. The harmonic mean of the recall and precision, F-measure, was used to measure the performance of each method. Our method achieved the highest F-measure of 0.84 in one sample, compared to 0.56 for GATK, 0.52 for PINDEL and 0.46 for ScanIndel. Similar results were obtained in additional samples, demonstrating that our method was superior to the other methods for detecting intermediate-size indels. We believe that this methodology will contribute to the discovery of intermediate-size indels associated with human disease.

Highlights

  • We investigated intermediate-size indels predicted in IMSindel using human DNA samples from a consortium for congenital neurological diseases and hearing loss

  • Several whole exome sequencing (WES) analyses have recently succeeded in identifying causal mutations of Mendelian diseases[18,19]

  • The reported detection rates for the deleterious mutations range from 25% to 50%20,21

Read more

Summary

Introduction

We evaluated intermediate-size indel candidates predicted by the IMSindel, all of which were checked using Sanger sequencing of the NA18943 and NA18948 samples. In NA12878, PacBio long read sequencing data was used for the validation of the 17 predicted indels, of which one was a false positive (for details see Materials and Methods). These mirrored the results obtained in NA18948 and NA12878, 0.71 and 0.78 in IMSindel, 0.43 and 0.36 in GATK HaplotypeCaller, 0.58 and 0.61 in PINDEL and 0.39 and 0.47 in ScanIndel (Table 2), demonstrating that our IMSindel was superior to the other three methods for detecting intermediate-size indels.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call