Abstract

Co-barcoded reads originating from long DNA fragments (mean length >30 kbp) maintain both single base level accuracy and long-range genomic information. We propose a pipeline, stLFRsv, to detect structural variation using co-barcoded reads. stLFRsv identifies abnormal large gaps between co-barcoded reads to detect potential breakpoints and reconstruct complex structural variants (SVs). Haplotype phasing by co-barcoded reads increases the signal to noise ratio, and barcode sharing profiles are used to filter out false positives. We integrate the short read SV caller smoove for smaller variants with stLFRsv. The integrated pipeline was evaluated on the well-characterized genome HG002/NA24385, and 74.5% precision and a 22.4% recall rate were obtained for deletions. stLFRsv revealed some large variants not included in the benchmark set that were verified by long reads or assembly. For the HG001/NA12878 genome, stLFRsv also achieved the best performance for both resource usage and the detection of large variants. Our work indicates that co-barcoded read technology has the potential to improve genome completeness.

Highlights

  • Structural variants (SVs) represent genome variants larger than 50 bp consisting of deletions, insertions, inversions, duplications, and translocations (Feuk et al, 2006; Alkan et al, 2011)

  • We down-sampled the data to 50× and 30× and called variants separately to provide guidance for applications. stLFRsv was assessed on the HG002 genome in manual parameter mode against the following four structural variants (SVs) callers: Long Ranger, NAIBR, smoove, and GROC-SVs (Spies et al, 2017; Elyanow et al, 2018; Marks et al, 2019)

  • We present stLFRsv, a co-barcoded read-based structural variation detector that identifies large variants with far fewer false positives than alignment-based detectors using either short reads or long reads. stLFRsv shows the best computational performance among co-barcoded read-based SV callers

Read more

Summary

Introduction

Structural variants (SVs) represent genome variants larger than 50 bp consisting of deletions, insertions, inversions, duplications, and translocations (Feuk et al, 2006; Alkan et al, 2011). SVs contribute more genomic sequence differences than single-nucleotide polymorphisms (SNPs) or small indels between genomes (Pang et al, 2010). Some of these SVs are pathogenic variants associated with specific diseases (Singleton et al, 2003; Jongmans et al, 2006; Rovelet-Lecrux et al, 2006). For the last 20 years, several technologies have allowed SV annotation to improve and have helped to generate a well-characterized human genome reference sequence to facilitate the development of SV identification tools (Zook et al, 2019[Preprint]). Each sequencing technique has unique advantages and disadvantages that contribute to the discovery of SV profiles among populations

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.