Abstract

Our goal in this study is to compare several widely used pseudo-relevance feedback (PRF) models and understand what explains their respective behavior. To do so, we first analyze how different PRF models behave through the characteristics of the terms they select and through their performance on two widely used test collections. This analysis reveals that several well-known models surprisingly tend to select very common terms, with low IDF (inverse document frequency). We then introduce several conditions PRF models should satisfy regarding both the terms they select and the way they weigh them, prior to study whether standard PRF models satisfy these conditions or not. This study reveals that most models are deficient with respect to at least one condition, and that this deficiency explains the results of our analysis of the behavior of the models, as well as some of the results reported on the respective performance of PRF models. Based on the PRF conditions, we finally propose possible corrections for the simple mixture model. The PRF models obtained after these corrections outperform their standard version and yield state-of-the-art PRF models which confirms the validity of our theoretical analysis.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.