Abstract

BackgroundThe identification of inversions of DNA segments shorter than read length (e.g., 100 bp), defined as micro-inversions (MIs), remains challenging for next-generation sequencing reads. It is acknowledged that MIs are important genomic variation and may play roles in causing genetic disease. However, current alignment methods are generally insensitive to detect MIs. Here we develop a novel tool, MID (Micro-Inversion Detector), to identify MIs in human genomes using next-generation sequencing reads.ResultsThe algorithm of MID is designed based on a dynamic programming path-finding approach. What makes MID different from other variant detection tools is that MID can handle small MIs and multiple breakpoints within an unmapped read. Moreover, MID improves reliability in low coverage data by integrating multiple samples. Our evaluation demonstrated that MID outperforms Gustaf, which can currently detect inversions from 30 bp to 500 bp.ConclusionsTo our knowledge, MID is the first method that can efficiently and reliably identify MIs from unmapped short next-generation sequencing reads. MID is reliable on low coverage data, which is suitable for large-scale projects such as the 1000 Genomes Project (1KGP). MID identified previously unknown MIs from the 1KGP that overlap with genes and regulatory elements in the human genome. We also identified MIs in cancer cell lines from Cancer Cell Line Encyclopedia (CCLE). Therefore our tool is expected to be useful to improve the study of MIs as a type of genetic variant in the human genome. The source code can be downloaded from: http://cqb.pku.edu.cn/ZhuLab/MID.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-2305-7) contains supplementary material, which is available to authorized users.

Highlights

  • The identification of inversions of DNA segments shorter than read length (e.g., 100 bp), defined as micro-inversions (MIs), remains challenging for next-generation sequencing reads

  • We focus on micro-inversions (MIs), defined as small inversions shorter than 100 bp, which are mostly ignored by existing methods

  • Gustaf is a tool based on a split-read approach, which records the local alignments provided by other tools and draws a split-read graph to use standard graph algorithms to evaluate relationships of the alignments

Read more

Summary

Introduction

The identification of inversions of DNA segments shorter than read length (e.g., 100 bp), defined as micro-inversions (MIs), remains challenging for next-generation sequencing reads. As no mapping information is available for the initially unmapped reads, most existing tools are incapable to detect MIs shorter than the read He et al BMC Genomics 2016, 17(Suppl 1): length. Speaking, existing methods use features such as read depth, read pair information, and split reads to identify SVs [15]. The strategies in utilizing read pairs (e.g., BreakDancer, DELLY) and split reads (e.g., Pindel, PRISM, Gustaf) are not appropriate to identify MIs less than 100 bp, because the size of SVs detected by these tools is dependent on insert size of paired-end reads and insensitive to variants as small as MIs. to increase the applicability of the method, it is important to handle datasets with low coverage. It is challenging to identify MIs in a read with multiple variants, as other types of variants (e.g., single nucleotide variations (SNVs), small indels and other rearrangements), may occur in nearby regions of the MI

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call