Effective collection construction for information retrieval evaluation and optimization

Dan Li

doi:10.1145/3483382.3483401

Abstract

The availability of test collections in Cranfield paradigm has significantly benefited the development of models, methods and tools in information retrieval. Such test collections typically consist of a set of topics, a document collection and a set of relevance assessments. Constructing these test collections requires effort of various perspectives such as topic selection, document selection, relevance assessment, and relevance label aggregation etc. The work in the thesis provides a fundamental way of constructing and utilizing test collections in information retrieval in an effective, efficient and reliable manner. To that end, we have focused on four aspects. We first study the document selection issue when building test collections. We devise an active sampling method for efficient large-scale evaluation [Li and Kanoulas, 2017]. Different from past sampling-based approaches, we account for the fact that some systems are of higher quality than others, and we design the sampling distribution to over-sample documents from these systems. At the same time, the estimated evaluation measures are unbiased, and assessments can be used to evaluate new, novel systems without introducing any systematic error. Then a natural further step is determining when to stop the document selection and assessment procedure. This is an important but understudied problem in the construction of test collections. We consider both the gain of identifying relevant documents and the cost of assessing documents as the optimization goals. We handle the problem under the continuous active learning framework by jointly training a ranking model to rank documents, and estimating the total number of relevant documents in the collection using a "greedy" sampling method [Li and Kanoulas, 2020]. The next stage of constructing a test collection is assessing relevance. We study how to denoise relevance assessments by aggregating from multiple crowd annotation sources to obtain high-quality relevance assessments. This helps to boost the quality of relevance assessments acquired in a crowdsourcing manner. We assume a Gaussian process prior on query-document pairs to model their correlation. The proposed model shows good performance in terms of interring true relevance labels. Besides, it allows predicting relevance labels for new tasks that has no crowd annotations, which is a new functionality of CrowdGP. Ablation studies demonstrate that the effectiveness is attributed to the modelling of task correlation based on the axillary information of tasks and the prior relevance information of documents to queries. After a test collection is constructed, it can be used to either evaluate retrieval systems or train a ranking model. We propose to use it to optimize the configuration of retrieval systems. We use Bayesian optimization approach to model the effect of a δ -step in the configuration space to the effectiveness of the retrieval system, by suggesting to use different similarity functions (covariance functions) for continuous and categorical values, and examine their ability to effectively and efficiently guide the search in the configuration space [Li and Kanoulas, 2018]. Beyond the algorithmic and empirical contributions, work done as part of this thesis also contributed to the research community as the CLEF Technology Assisted Reviews in Empirical Medicine Tracks in 2017, 2018, and 2019 [Kanoulas et al., 2017, 2018, 2019]. Awarded by: University of Amsterdam, Amsterdam, The Netherlands. Supervised by: Evangelos Kanoulas. Available at: https://dare.uva.nl/search?identifier=3438a2b6-9271-4f2c-add5-3c811cc48d42.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Effective collection construction for information retrieval evaluation and optimization

Abstract

Talk to us

Similar Papers

More From: ACM SIGIR Forum

Lead the way for us

Journal: ACM SIGIR Forum	Publication Date: Dec 1, 2020
License type: other-oa

Similar Papers

Test collection management and labeling system
Eunyee Koh ... Andruid Kerne
-
Eunyee Koh, et. al.Eunyee Koh ... Andruid Kerne
16 Sep 2009
16 Sep 2009

So many topics, so little time
Giovanna Roda ... Kalervo Järvelin
ACM SIGIR Forum | VOL. 43
Giovanna Roda, et. al.Giovanna Roda ... Kalervo Järvelin
25 Jun 2009
ACM SIGIR Forum | VOL. 43

The Text REtrieval Conferences (TRECs): Providing a Test‐Bed for Information Retrieval Systems
Donna Harman
Bulletin of the American Society for Information Science and Technology | VOL. 24
Donna HarmanDonna Harman
01 Apr 1998
Bulletin of the American Society for Information Science and Technology | VOL. 24

The Evolution of Cranfield
Ellen M Voorhees
-
Ellen M VoorheesEllen M Voorhees
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Effective collection construction for information retrieval evaluation and optimization

Abstract

Talk to us

Similar Papers

More From: ACM SIGIR Forum