Abstract

This paper proposes a novel method which employs cloud model theory to automatically explore distinguish weight of each term for the query statement to improve performance of BM25 in information retrieval. The excellent performance of BM25 is mainly contribute to the sub-linear Term Frequency (TF) normalization formula, research of cloud model theory illustrates that the uncertainty of query term’s distribution among different documents can make distinctiveness contribution to relevant evaluation, this paper introduce cloud model theory into information retrieval systems based on BM25 to take the intrinsic law of term distribution into account. By using Digital Characters of cloud model, we can reduce the noise of the query and automatically obtain each query term a distinctive weight to revise the BM25. Further more, modified cloud model can adjust itself to BM25F, a variant of BM25. Experiments on NTCIR-5 (the 5th NII Test Collection for IR Systems) document collection for SLIR (Single Language IR) show that our method achieves effective improvement comparing with the standard BM25.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.