Abstract

Linked-Reads technologies combine both the high quality and low cost of short-reads sequencing and long-range information, through the use of barcodes tagging reads which originate from a common long DNA molecule. This technology has been employed in a broad range of applications including genome assembly, phasing and scaffolding, as well as structural variant calling. However, to date, no tool or API dedicated to the manipulation of Linked-Reads data exist. We introduce LRez, a C++ API and toolkit that allows easy management of Linked-Reads data. LRez includes various functionalities, for computing numbers of common barcodes between genomic regions, extracting barcodes from BAM files, as well as indexing and querying BAM, FASTQ and gzipped FASTQ files to quickly fetch all reads or alignments containing a given barcode. LRez is compatible with a wide range of Linked-Reads sequencing technologies, and can thus be used in any tool or pipeline requiring barcode processing or indexing, in order to improve their performances. LRez is implemented in C++, supported on Unix-based platforms and available under AGPL-3.0 License at https://github.com/morispi/LRez, and as a bioconda module. Supplementary data are available at Bioinformatics Advances online.

Highlights

  • Linked-Reads technologies, pioneered by 10x Genomics (Medsker et al, 2016), partition and tag high-molecular-weight DNA molecules with a barcode using a microfluidic device prior to classical short-read sequencing. This way, all the sequenced reads that come from a common molecule contain an identical barcode, offering additional data for downstream processing, compared to classical short reads

  • To emphasize the usefulness of LRez, the API is already used in the structural variant calling tool LEVIATHAN (Morisse et al, 2021), where

  • The FASTQ indexing and querying features of the LRez toolkit are currently used in the gap-filling pipeline MTG-Link, to efficiently retrieve read sequences, selected based on their barcodes, for local de novo assembly

Read more

Summary

Introduction

Linked-Reads technologies, pioneered by 10x Genomics (Medsker et al, 2016), partition and tag high-molecular-weight DNA molecules with a barcode using a microfluidic device prior to classical short-read sequencing. Three other Linked-Reads technologies have been developed and commercialized in the last two years, namely TELL-seq (Chen et al, 2020), stLFR (Wang et al, 2019) and the open protocol Haplotagging (Meier et al, 2021). They have already produced many such data and will likely increase their throughput in the future. The lower cost of Haplotagging, with respect to long read technologies is very attractive, especially for large-population re-sequencing projects

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.