Abstract

Relevance feedback has been demonstrated to be an effective strategy for improving retrieval accuracy. The existing relevance feedback algorithms based on language models and vector space models are not effective in learning from negative feedback documents, which are abundant if the initial query is difficult. The probabilistic retrieval model has the advantage of being able to naturally improve the estimation of both the relevant and non-relevant models. The Dirichlet compound multinomial (DCM) distribution, which relies on hierarchical Bayesian modeling techniques, is a more appropriate generative model for the probabilistic retrieval model than the traditional multinomial distribution. We propose a new relevance feedback algorithm, based on a mixture model of the DCM distribution, to effectively model the overlaps between the positive and negative feedback documents. Consequently, the new algorithm improves the retrieval performance substantially for difficult queries. To further reduce human relevance evaluation, we propose a new active learning algorithm in conjunction with the new relevance feedback model. The new active learning algorithm implicitly models the diversity, density and relevance of unlabeled data in a transductive experimental design framework. Experimental results on several TREC datasets show that both the relevance feedback and active learning algorithm significantly improve retrieval accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.