Abstract

Inverted indexes are widely adopted in the vast majority of information systems. Growing requirements for efficient query processing have motivated the development of various compression techniques with different space-time characteristics. Although a single encoder yields a relatively stable point in the space-time tradeoff curve, flexibly transforming its characteristic along the curve to fit different information retrieval tasks can be a better way to prepare the index. Recent research comes out with an idea of integrating different encoders within the same index, namely, exploiting access skewness by compressing frequently accessed regions with faster encoders and rarely accessed regions with succinct encoders, thereby improving the efficiency while minimizing the compressed size. However, these methods are either inefficient or result in coarse granularity. To address these issues, we introduce the concept of bicriteria compression, which aims to formalize the problem of optimally trading the compressed size and query processing time for inverted index. We also adopt a Lagrangian relaxation algorithm to solve this problem by reducing it to a knapsack-type problem, which works in O(n log n) time and O(n) space, with a negligible additive approximation. Furthermore, this algorithm can be extended via dynamic programming pursuing improved query efficiency. We perform an extensive experiment to show that, given a bounded time/space budget, our method can optimally trade one for another with more efficient indexing and query performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.