Abstract

BackgroundComplex insertions and deletions (indels) from next-generation sequencing (NGS) data were prone to escape detection by currently available variant callers as shown by large-scale human genomics studies. Somatic and germline complex indels in key disease driver genes could be missed in NGS-based genomics studies.ResultsINDELseek is an open-source complex indel caller designed for NGS data of random fragments and PCR amplicons. The key differentiating factor of INDELseek is that each NGS read alignment was examined as a whole instead of “pileup” of each reference position across multiple alignments. In benchmarking against the reference material NA12878 genome (n = 160 derived from high-confidence variant calls), GATK, SAMtools and INDELseek showed complex indel detection sensitivities of 0%, 0% and 100%, respectively. INDELseek also detected all known germline (BRCA1 and BRCA2) and somatic (CALR and JAK2) complex indels in human clinical samples (n = 8). Further experiments validated all 10 detected KIT complex indels in a discovery cohort of clinical samples. In silico semi-simulation showed sensitivities of 93.7–96.2% based on 8671 unique complex indels in >5000 genes from dbSNP and COSMIC. We also demonstrated the importance of complex indel detection in accurately annotating BRCA1, BRCA2 and TP53 mutations with gained or rescued protein-truncating effects.ConclusionsINDELseek is an accurate and versatile tool for complex indel detection in NGS data. It complements other variant callers in NGS-based genomics studies targeting a wide spectrum of genetic variations.

Highlights

  • Complex insertions and deletions from next-generation sequencing (NGS) data were prone to escape detection by currently available variant callers as shown by large-scale human genomics studies

  • We observed closely spaced single nucleotide variant (SNV) that appeared in trans in the alignments and 26 such loci were manually curated as negative controls for complex indel detection (Additional file 1: Table S2)

  • To demonstrate the importance of accurate complex indel detection in clinical settings, we focused on 127 multiple-nucleotide variants (MNV) in hereditary breast and/or ovarian cancer (HBOC) genes and compared their variant annotation results (Variant Effect Predictor) in two scenarios: (1) original MNV and (2) decomposing MNV into individual single-nucleotide variant for separate annotation, as if the MNV could not be called as a haplotype

Read more

Summary

Introduction

Complex insertions and deletions (indels) from next-generation sequencing (NGS) data were prone to escape detection by currently available variant callers as shown by large-scale human genomics studies. Somatic and germline complex indels in key disease driver genes could be missed in NGS-based genomics studies. Complex insertions and deletions (indels) are a known class of genetic variation [1] associated with human diseases [2]. Simultaneous deletion and insertion of DNA fragments of different sizes lead to net change in length. No net change in length is possible in case of contiguous or non-contiguous multiple-nucleotide variants (MNV). Recent studies revealed the shortcomings of state-of-the-art variant callers that might fail to detect somatic and germline complex indels [3, 4]. Important mutations in key disease driver genes

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.