Abstract

An information retrieval (IR) system with query expansion on a low-cost high-performance PC cluster environment is implemented. The IR system stores document sets, it is indexed by the inverted-index-file (IIF), and the vector space model is used as ranking strategy. The query expansion is adding terms into the original query for raising retrieval effectiveness. In this work, the query expansion with the collocation-based similarity measure is used. In our parallel IR system, the inverted-index file (IIF) is partitioned into pieces using the lexical and the greedy declustering methods. For each incoming user's query withm ultiple terms after query expansion, terms are sent to the corresponding nodes that contain the relevant pieces of the IIF to be evaluated in parallel. We study how query performance is affected by query expansion and two declustering methods using two standard Korean test collections. According to the experiments, the greedy method shows about 20% enhancement overall when compared with the lexical method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.