Abstract
The vast quantities of short-read sequencing data being generated are often exchanged and stored as aligned reads. However, aligned data becomes outdated as new reference genomes and alignment methods become available. Here we describe Bazam, a tool that efficiently extracts the original paired FASTQ from alignment files (BAM or CRAM format) in a format that directly allows efficient realignment. Bazam facilitates up to a 90% reduction in the time for realignment compared to standard methods. Bazam can support selective extraction of read pairs from focused genomic regions for applications such as targeted region analyses, quality control, structural variant calling, and alignment comparisons.
Highlights
The wide-scale adoption of high-throughput genomic sequencing instruments over the last 10 years has generated vast quantities of genomic data with enormous potential for future use
Bazam design Pairing of reads The primary challenge in extracting paired reads from BAM and CRAM files arises from the predominant choice of coordinate-sorted ordering for their storage
One possibility is to retrieve each mate as needed using a random seek within the file to the location of its mate
Summary
The wide-scale adoption of high-throughput genomic sequencing instruments over the last 10 years has generated vast quantities of genomic data with enormous potential for future use. Genomic data is often stored and exchanged as aligned reads in a coordinate-sorted BAM or CRAM format. This format is common because many applications (such as viewing the alignment or routine variant calling) can utilize it directly. Storage in aligned form, has the significant disadvantage that the data is tied to the reference genome and alignment method used. To make optimal use of data, users often need to realign the data to a recent genome build and reference. This is resulting in a widespread and growing need for the capability to efficiently realign genomic data
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have