Abstract

Abstract : Automatic suggestion of alternative terms to refine a user's query is an effective technique to help the user quickly narrow down to his(her) specific information need. However, evaluating the effectiveness of these suggestions has remained quite subjective, with a vast majority of the past work relying on expensive user studies. In this work, we look at this problem from the IR perspective. We propose two objective measures that evaluate the quality of Query Refinement (QR) suggestions, based on the degree to which the documents retrieved by the QR suggestions, when used as queries, capture the overall sub-topical structure underlying the topic of the original query. The first measure, known as Maximum Matching Averaged Mean Average Precision (MM-AMAP) requires labeled documents for the sub-topics underlying the query's topic. The second measure which we call Distinctness and MAP based F1 (DMAP-F1) requires only labeled documents that are relevant to the original query. We also define a series of simple QR suggestion techniques, each of which is intuitively better than the previous ones and evaluate them using our measures on TDT3 and TDT4 corpora. Our experiments show that our evaluation metrics numerically capture our intuitive expectations on performance, thus informally validating our measures. Further, we also show that the second metric DMAP-F1, that does not require sub-topic judgments, is consistent in results as well as statistically highly correlated with the first metric. This allows us to perform extensive evaluations of the quality of QR suggestion techniques on standard TREC collections in the future.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call