Abstract

One of the key challenges facing genomics today is efficiently storing the massive amounts of data generated by next-generation sequencing platforms. Reference-based compression is a popular strategy for reducing the size of genomic data, whereby sequence information is encoded as a mapping to a known reference sequence. Determining the mapping is a computationally intensive problem, and is the bottleneck of most reference-based compression tools currently available. This paper presents the first FPGA acceleration of reference-based compression for genomic data. We develop a new mapping algorithm based on the FM-index search operation which includes optimisations targeting the compression ratio and speed. Our hardware design is implemented on a Maxeler MPC-X2000 node comprising 8 Altera Stratix V FPGAs. When evaluated against compression tools currently available, our tool achieves a superior compression ratio, compression time, and energy consumption for both FASTA and FASTQ formats. For example, our tool achieves a 30% higher compression ratio and is 71.9 times faster than the fastqz tool.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call