The strange case of reproducibility versus representativeness in contextual suggestion test collections

Thaer Samar,Arjen P De Vries,Alejandro Bellogín

doi:10.1007/s10791-015-9276-9

Abstract

The most common approach to measuring the effectiveness of Information Retrieval systems is by using test collections. The Contextual Suggestion (CS) TREC track provides an evaluation framework for systems that recommend items to users given their geographical context. The specific nature of this track allows the participating teams to identify candidate documents either from the Open Web or from the ClueWeb12 collection, a static version of the web. In the judging pool, the documents from the Open Web and ClueWeb12 collection are distinguished. Hence, each system submission should be based only on one resource, either Open Web (identified by URLs) or ClueWeb12 (identified by ids). To achieve reproducibility, ranking web pages from ClueWeb12 should be the preferred method for scientific evaluation of CS systems, but it has been found that the systems that build their suggestion algorithms on top of input taken from the Open Web achieve consistently a higher effectiveness. Because most of the systems take a rather similar approach to making CSs, this raises the question whether systems built by researchers on top of ClueWeb12 are still representative of those that would work directly on industry-strength web search engines. Do we need to sacrifice reproducibility for the sake of representativeness? We study the difference in effectiveness between Open Web systems and ClueWeb12 systems through analyzing the relevance assessments of documents identified from both the Open Web and ClueWeb12. Then, we identify documents that overlap between the relevance assessments of the Open Web and ClueWeb12, observing a dependency between relevance assessments and the source of the document being taken from the Open Web or from ClueWeb12. After that, we identify documents from the relevance assessments of the Open Web which exist in the ClueWeb12 collection but do not exist in the ClueWeb12 relevance assessments. We use these documents to expand the ClueWeb12 relevance assessments. Our main findings are twofold. First, our empirical analysis of the relevance assessments of 2 years of CS track shows that Open Web documents receive better ratings than ClueWeb12 documents, especially if we look at the documents in the overlap. Second, our approach for selecting candidate documents from ClueWeb12 collection based on information obtained from the Open Web makes an improvement step towards partially bridging the gap in effectiveness between Open Web and ClueWeb12 systems, while at the same time we achieve reproducible results on well-known representative sample of the web.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information Retrieval Journal	Publication Date: Dec 28, 2015
Citations: 11	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

The strange case of reproducibility versus representativeness in contextual suggestion test collections

Abstract

Talk to us

Similar Papers

More From: Information Retrieval Journal

Lead the way for us

Similar Papers

Cost and benefit analysis of mediated enterprise search
Mingfang Wu ... James A Thom
-
Mingfang Wu, et. al.Mingfang Wu ... James A Thom
15 Jun 2009
15 Jun 2009

Relevance assessments and retrieval system evaluation
M.E Lesk ... G. Salton
Information Storage and Retrieval | VOL. 4
M.E Lesk, et. al.M.E Lesk ... G. Salton
01 Dec 1968
Information Storage and Retrieval | VOL. 4

Effective collection construction for information retrieval evaluation and optimization
Dan Li
ACM SIGIR Forum | VOL. 54
Dan LiDan Li
01 Dec 2020
ACM SIGIR Forum | VOL. 54

Boiling down information retrieval test collections
...
-
, et. al. ...
28 Apr 2010
28 Apr 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The strange case of reproducibility versus representativeness in contextual suggestion test collections

Abstract

Talk to us

Similar Papers

More From: Information Retrieval Journal