Crowdsourcing for book search evaluation

Gabriella Kazai,Jaap Kamps,Marijn Koolen,Natasa Milic-Frayling

doi:10.1145/2009916.2009947

Abstract

The evaluation of information retrieval (IR) systems over special collections, such as large book repositories, is out of reach of traditional methods that rely upon editorial relevance judgments. Increasingly, the use of crowdsourcing to collect relevance labels has been regarded as a viable alternative that scales with modest costs. However, crowdsourcing suffers from undesirable worker practices and low quality contributions. In this paper we investigate the design and implementation of effective crowdsourcing tasks in the context of book search evaluation. We observe the impact of aspects of the Human Intelligence Task (HIT) design on the quality of relevance labels provided by the crowd. We assess the output in terms of label agreement with a gold standard data set and observe the effect of the crowdsourced relevance judgments on the resulting system rankings. This enables us to observe the effect of crowdsourcing on the entire IR evaluation process. Using the test set and experimental runs from the INEX 2010 Book Track, we find that varying the HIT design, and the pooling and document ordering strategies leads to considerable differences in agreement with the gold set labels. We then observe the impact of the crowdsourced relevance label sets on the relative system rankings using four IR performance metrics. System rankings based on MAP and Bpref remain less affected by different label sets while the Precision@10 and nDCG@10 lead to dramatically different system rankings, especially for labels acquired from HITs with weaker quality controls. Overall, we find that crowdsourcing can be an effective tool for the evaluation of IR systems, provided that care is taken when designing the HITs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Crowdsourcing for book search evaluation

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Information retrieval evaluation with humans in the loop
Gabriella Kazai
-
Gabriella KazaiGabriella Kazai
26 Aug 2014
26 Aug 2014

Social Informatics and Information Retrieval Systems
Xiaoya Tang
Bulletin of the American Society for Information Science and Technology | VOL. 26
Xiaoya TangXiaoya Tang
01 Feb 2000
Bulletin of the American Society for Information Science and Technology | VOL. 26

Preferences on a Budget: Prioritizing Document Pairs when Crowdsourcing Relevance Judgments
Kevin Roitero ... Gianluca Demartini
-
Kevin Roitero, et. al.Kevin Roitero ... Gianluca Demartini
25 Apr 2022
25 Apr 2022

Pooling-based continuous evaluation of information retrieval systems
Alberto Tonon ... Gianluca Demartini
Information Retrieval Journal | VOL. 18
Alberto Tonon, et. al.Alberto Tonon ... Gianluca Demartini
08 Sep 2015
Information Retrieval Journal | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Crowdsourcing for book search evaluation

Abstract

Talk to us

Similar Papers