Abstract

Inverted files are most commonly used technique for efficient query processing and fast text searching in Information Retrieval System (IRS). But the size of the inverted files is extremely large due to rapid growth in the size of the data in the information retrieval system. So as to reduce the index size and increase the accessing speed, compression techniques are used. In this paper, we propose a new integer compression technique called Fast Extended Golomb code (FEGC) based on Extended Golomb Code (EGC), to reduce the size as well as increasing the decoding speed of the inverted index. The decoding speed is very important to increase the speed of query processing in IRS applications. We have implemented and tested the performance of FEGC and EGC with other existing techniques. Experimental results show that the EGC compression techniques perform well and give better compression than other existing techniques. EGC is also relatively better than FEGC. But the number of CPU cycles required by EGC is more than that of FEGC for encoding and decoding an integer. Hence FEGC could be faster encoder than EGC while it gives comparable results with respect to EGC in compressing doc–ids for IRS applications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.