Abstract

Inverted indexes are mainstream in Information Retrieval systems and many compression techniques have been proposed. The purpose of this paper is to explore the compression efficiency on a two-level inverted index tailored for n-gram indices. We use two compression techniques Optimal PForDelta and IPC. Both techniques are applied to a previous work of us, that has focused on developing a threshold to efficiently store subsequences inside a one or two level inverted index, based on their number of occurrences inside a biological sequence. We study the performance of these two compression algorithms over different fluctuations of the threshold. The compression ratio of the OptPFD is affected by the changes in the threshold and is also efficient as in text documents. Whereas, IPC has a different performance for each threshold and it is more stable, although it is much less efficient than in text documents.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.