Towards Designing Better Session Search Evaluation Metrics

Mengyang Liu,Cheng Luo,Yiqun Liu,Jiaxin Mao,Shaoping Ma

doi:10.1145/3209978.3210097

Abstract

User satisfaction has been paid much attention to in recent Web search evaluation studies and regarded as the ground truth for designing better evaluation metrics. However, most existing studies are focused on the relationship between satisfaction and evaluation metrics at query-level. However, while search request becomes more and more complex, there are many scenarios in which multiple queries and multi-round search interactions are needed (e.g. exploratory search). In those cases, the relationship between session-level search satisfaction and session search evaluation metrics remain uninvestigated. In this paper, we analyze how users' perceptions of satisfaction accord with a series of session-level evaluation metrics. We conduct a laboratory study in which users are required to finish some complex search tasks and provide usefulness judgments of documents as well as session-level and query level satisfaction feedbacks. We test a number of popular session search evaluation metrics as well as different weighting functions. Experiment results show that query-level satisfaction is mainly decided by the clicked document that they think the most useful (maximum effect). While session-level satisfaction is highly correlated with the most recently issued queries (recency effect). We further propose a number of criteria for designing better session search evaluation metrics.

Full Text