Abstract

We give new space/time tradeoffs for compressed indexes that answer document retrieval queries on general sequences. On a collection of D documents of total length n, current approaches require at least |CSA|+O(nlgDlglgD) or 2|CSA|+o(n) bits of space, where CSA is a full-text index. Using monotone minimal perfect hash functions (mmphfs), we give new algorithms for document listing with frequencies and top-k document retrieval using just |CSA|+O(nlglglgD) bits. We also improve current solutions that use 2|CSA|+o(n) bits, and consider other problems such as colored range listing, top-k most important documents, and computing arbitrary frequencies. We give proof-of-concept experimental results that show that using mmphfs may provide relevant practical tradeoffs for document listing with frequencies.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.