Abstract

The FM-index is a data structure used in genomics for exact search of input sequences over large reference genomes. Algorithms based on the FM-index show an irregular memory access pattern, resulting in a memory bound problem. We analyze a recent implementation of the FM-index and highlight existing throughput-memory trade-offs, showing that memory requirements limit implementation of large k-steps. We propose COFI, a COmpressed FM-Index for large K-steps. COFI enables a 15-step FM-index using less than 16 GB for a human genome reference of 3 giga base pairs. An algorithm based on this new layout is evaluated on both a Knights Landing (KNL) and an Skylake-based system (SKX). We achieve average speed-ups of 1.46× and 1.39×, respectively, with respect to an state-of-the-art FM-index implementation that is already well optimized.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.