Abstract

Similarity measures have been used widely in information retrieval research. Most research has been done on query-document or document-document similarity without much attention to the user's perception of similarity in the context of the information need. In this study, we collect user preference judgements of web document similarity in order to investigate: (1) the correlation between similarity measures and users' perception of similarity, (2) the correlation between the web document features plus document-query features and users' similarity judgements. We analyze the performance of various similarity methods at predicting user preferences, in both unsupervised and supervised settings. We show that a supervised approach using many features is able to predict user preferences close to the level of agreement between users, and moreover achieve a 15% improvement in AUC over an unsupervised approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call