Abstract

Long-read sequencing is promising for the comprehensive discovery of structural variations (SVs). However, it is still non-trivial to achieve high yields and performance simultaneously due to the complex SV signatures implied by noisy long reads. We propose cuteSV, a sensitive, fast, and scalable long-read-based SV detection approach. cuteSV uses tailored methods to collect the signatures of various types of SVs and employs a clustering-and-refinement method to implement sensitive SV detection. Benchmarks on simulated and real long-read sequencing datasets demonstrate that cuteSV has higher yields and scaling performance than state-of-the-art tools. cuteSV is available at https://github.com/tjiangHIT/cuteSV.

Highlights

  • Structural variations (SVs) represent genomic rearrangements such as deletions, insertions, inversions, duplications, and translocations whose sizes are larger than 50 bp [1]

  • As the largest divergences across human genomes [2], structural variations (SVs) are closely related to human diseases, evolution, gene regulations, and other phenotypes

  • To assess the ability to detect various types of SVs more comprehensively, we further employed Genome in a Bottle Consortium (GIAB) Ashkenazi Trio Pacific Bioscience (PacBio) CLR datasets (HG002, HG003, and HG004) to assess the recall rates and Mendelian-Discordance-Rates (MDRs). cuteSV and SVIM obtained > 95% mean recall rate, i.e., more than 95% homozygous parental SVs has been confirmed in the offspring (Fig. 3c and Additional file 1: Table S11). cuteSV was 1% lower than SVIM on recall rate; we realized that this does not mean a lower sensitivity of cuteSV, but due to that about 15% SVs in parental callsets discovered by SVIM had no genotypes so that they cannot be assessed and decreased the total number of homozygous parental SVs

Read more

Summary

Introduction

Structural variations (SVs) represent genomic rearrangements such as deletions, insertions, inversions, duplications, and translocations whose sizes are larger than 50 bp [1]. Efforts have been made to develop short-read-based SV calling approaches [12, 13]. With the rapid development of long-read sequencing technologies, such as Pacific Bioscience (PacBio) [23] and Oxford Nanopore Technology (ONT) [24] platforms, long-range spanning information provides the opportunity to more comprehensively detect SVs at a higher resolution [25]. Novel computational approaches are required to well-handle the high sequencing error rates (typically 5–20%) and large

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call