Fewer topics? A million topics? Both?! On topics subsets in test collections

Kevin Roitero,J Shane Culpepper,Stefano Mizzaro,Falk Scholer,Mark Sanderson

doi:10.1007/s10791-019-09357-w

Abstract

When evaluating IR run effectiveness using a test collection, a key question is: What search topics should be used? We explore what happens to measurement accuracy when the number of topics in a test collection is reduced, using the Million Query 2007, TeraByte 2006, and Robust 2004 TREC collections, which all feature more than 50 topics, something that has not been examined in past work. Our analysis finds that a subset of topics can be found that is as accurate as the full topic set at ranking runs. Further, we show that the size of the subset, relative to the full topic set, can be substantially smaller than was shown in past work. We also study the topic subsets in the context of the power of statistical significance tests. We find that there is a trade off with using such sets in that significant results may be missed, but the loss of statistical significance is much smaller than when selecting random subsets. We also find topic subsets that can result in a low accuracy test collection, even when the number of queries in the subset is quite large. These negatively correlated subsets suggest we still lack good methodologies which provide stability guarantees on topic selection in new collections. Finally, we examine whether clustering of topics is an appropriate strategy to find and characterize good topic subsets. Our results contribute to the understanding of information retrieval effectiveness evaluation, and offer insights for the construction of test collections.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Fewer topics? A million topics? Both?! On topics subsets in test collections

Abstract

Talk to us

Similar Papers

More From: Information Retrieval Journal

Lead the way for us

Journal: Information Retrieval Journal	Publication Date: May 8, 2019
Citations: 8

Similar Papers

Effective collection construction for information retrieval evaluation and optimization
Dan Li
ACM SIGIR Forum | VOL. 54
Dan LiDan Li
01 Dec 2020
ACM SIGIR Forum | VOL. 54

On the independence of statistical randomness tests included in the NIST test suite
Fatih Sulak ... Onur Koçak
TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES | VOL. 25
Fatih Sulak, et. al.Fatih Sulak ... Onur Koçak
01 Jan 2017
TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES | VOL. 25

Intelligent topic selection for low-cost information retrieval evaluation: A New perspective on deep vs. shallow judging
Mucahid Kutlu ... Matthew Lease
Information Processing and Management | VOL. 54
Mucahid Kutlu, et. al.Mucahid Kutlu ... Matthew Lease
22 Sep 2017
Information Processing and Management | VOL. 54

The Perceived Similarity of Photos - A Test-Collection Based Evaluation Framework for the Content-Based Image Retrieval Algorithms1
Eero Sormunen ... Kalervo Jarvelin
-
Eero Sormunen, et. al.Eero Sormunen ... Kalervo Jarvelin
01 Jan 1998
01 Jan 1998

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fewer topics? A million topics? Both?! On topics subsets in test collections

Abstract

Talk to us

Similar Papers

More From: Information Retrieval Journal