The Full-text index in Minute space (FM-index) is a memory-efficient data structure widely used in bioinformatics for solving the fundamental pattern-matching task of searching for short patterns within a long reference. With the demand for short query patterns, the k-ordered concept has been proposed for FM-indexes. However, few construction algorithms in the state of the art fully exploit this idea to achieve significant speedups in the pan-genome era. We introduce the k-ordered Induced Suffix Sorting (kISS) for efficient construction and utilization of k-ordered FM-indexes. We present an algorithmic workflow for building k-ordered suffix arrays, incorporating two novel strategies to improve time and memory efficiency. We also demonstrate the compatibility of integrating k-ordered FM-indexes with locate operations in FMtree. Experiments show that kISS can improve the construction time, and the generated k-ordered suffix array can also be applied to FMtree without any additional in computation or memory usage. https://github.com/jhhung/kISS. Supplementary data are available at Bioinformatics online.
Read full abstract