Abstract
Fast pattern matching is a requirement for many problems, specially for bioinformatics sequence analysis like short read mapping applications. This work presents a variation of the FM-index method, denoted n-step FM-index, that is applied in exact match genome search.We propose an alternative two-dimensional FM-index structure that allows backward-search navigation giving steps of n symbols at a time. The main advantages of this arrangement are the reduction of the computational work, but most importantly, the reduction by n of the chain of dependent data accesses, and the increase in the temporal locality of the data access pattern. This benefit comes at the expense of increasing the total amount of data required for the index.We present an in-depth performance analysis of a multi-core implementation of the algorithm using large references (up to 1.5G). We identify memory latency as the major performance limiter for single-thread execution and memory bandwidth for multi-thread execution. Our proposal provides speedups ranging from 1.4× to 2.4×, when there is no limitation on DRAM capacity.We also analyse the trade-off of compacting the proposed data structure in order to reduce memory capacity requirements, now at the expense of increasing execution time. An extra 33% of DRAM space allows our proposal to improve performance by 1.2×, while doubling DRAM size enables an additional 1.5×.Our proposal of n-step algorithm provides an alternative for pseudo-random memory access algorithms to be redesigned to scale in current and future computer systems.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.