Abstract

Many efficient open-source suffix sorters using the induced sorting (IS) method to build the fundamental data structure suffix array (SA) for compressing and indexing data have been proposed. To avoid potential faults caused by possible implementation bugs, checking the output SA from any IS sorter without engineering warranty for correctness is a de-facto process. The existing SA checkers commonly perform checking after an SA is built completely, with significant time and space complexities compared with that of builders. This article proposes an efficient solution for building and checking SA simultaneously by enhancing the original IS method with a checking scheme using hash computations to on-the-fly verify the results produced by the last induction phase of IS method. Given an input of constant alphabet, this checking scheme requires linear time and constant RAM space when running on external memory, and its time and space overheads are negligible compared with that for building SA. In our experiments on real-world data, the proposed methods take advantages over the counterparts of existing SA checkers by running faster with less space. This work can help provide a value-added bonus feature for open-source IS sorters to guarantee the correctness of a built SA, and such a feature should be desirable for applications using these sorters.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call