Abstract
Linked-Reads technologies combine both the high quality and low cost of short-reads sequencing and long-range information, through the use of barcodes tagging reads which originate from a common long DNA molecule. This technology has been employed in a broad range of applications including genome assembly, phasing and scaffolding, as well as structural variant calling. However, to date, no tool or API dedicated to the manipulation of Linked-Reads data exist. We introduce LRez, a C++ API and toolkit that allows easy management of Linked-Reads data. LRez includes various functionalities, for computing numbers of common barcodes between genomic regions, extracting barcodes from BAM files, as well as indexing and querying BAM, FASTQ and gzipped FASTQ files to quickly fetch all reads or alignments containing a given barcode. LRez is compatible with a wide range of Linked-Reads sequencing technologies, and can thus be used in any tool or pipeline requiring barcode processing or indexing, in order to improve their performances. LRez is implemented in C++, supported on Unix-based platforms and available under AGPL-3.0 License at https://github.com/morispi/LRez, and as a bioconda module. Supplementary data are available at Bioinformatics Advances online.
Highlights
Linked-Reads technologies, pioneered by 10x Genomics (Medsker et al, 2016), partition and tag high-molecular-weight DNA molecules with a barcode using a microfluidic device prior to classical short-read sequencing. This way, all the sequenced reads that come from a common molecule contain an identical barcode, offering additional data for downstream processing, compared to classical short reads
To emphasize the usefulness of LRez, the API is already used in the structural variant calling tool LEVIATHAN (Morisse et al, 2021), where
The FASTQ indexing and querying features of the LRez toolkit are currently used in the gap-filling pipeline MTG-Link, to efficiently retrieve read sequences, selected based on their barcodes, for local de novo assembly
Summary
Linked-Reads technologies, pioneered by 10x Genomics (Medsker et al, 2016), partition and tag high-molecular-weight DNA molecules with a barcode using a microfluidic device prior to classical short-read sequencing. Three other Linked-Reads technologies have been developed and commercialized in the last two years, namely TELL-seq (Chen et al, 2020), stLFR (Wang et al, 2019) and the open protocol Haplotagging (Meier et al, 2021). They have already produced many such data and will likely increase their throughput in the future. The lower cost of Haplotagging, with respect to long read technologies is very attractive, especially for large-population re-sequencing projects
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have