The Benefits of Magnitude Estimation Relevance Assessments for Information Retrieval Evaluation

Andrew Turpin,Stefano Mizzaro,Falk Scholer,Eddy Maddalena

doi:10.1145/2766462.2767760

Abstract

Magnitude estimation is a psychophysical scaling technique for the measurement of sensation, where observers assign numbers to stimuli in response to their perceived intensity. We investigate the use of magnitude estimation for judging the relevance of documents in the context of information retrieval evaluation, carrying out a large-scale user study across 18 TREC topics and collecting more than 50,000 magnitude estimation judgments. Our analysis shows that on average magnitude estimation judgments are rank-aligned with ordinal judgments made by expert relevance assessors. An advantage of magnitude estimation is that users can chose their own scale for judgments, allowing deeper investigations of user perceptions than when categorical scales are used. We explore the application of magnitude estimation for IR evaluation, calibrating two gain-based effectiveness metrics, nDCG and ERR, directly from user-reported perceptions of relevance. A comparison of TREC system effectiveness rankings based on binary, ordinal, and magnitude estimation relevance shows substantial variation; in particular, the top systems ranked using magnitude estimation and ordinal judgments differ substantially. Analysis of the magnitude estimation scores shows that this effect is due in part to varying perceptions of relevance, in terms of how impactful relative differences in document relevance are perceived to be. We further use magnitude estimation to investigate gain profiles, comparing the currently assumed linear and exponential approaches with actual user-reported relevance perceptions. This indicates that the currently used exponential gain profiles in nDCG and ERR are mismatched with an average user, but perhaps more importantly that individual perceptions are highly variable. These results have direct implications for IR evaluation, suggesting that current assumptions about a single view of relevance being sufficient to represent a population of users are unlikely to hold. Finally, we demonstrate that magnitude estimation judgments can be reliably collected using crowdsourcing, and are competitive in terms of assessor cost.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The Benefits of Magnitude Estimation Relevance Assessments for Information Retrieval Evaluation

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

On Crowdsourcing Relevance Magnitudes for Information Retrieval Evaluation
Eddy Maddalena ... Andrew Turpin
ACM Transactions on Information Systems | VOL. 35
Eddy Maddalena, et. al.Eddy Maddalena ... Andrew Turpin
04 Jan 2017
ACM Transactions on Information Systems | VOL. 35

Four factors in loudness change
Laurie Smith ... Ernest M Weiler
The Journal of the Acoustical Society of America | VOL. 79
Laurie Smith, et. al.Laurie Smith ... Ernest M Weiler
01 May 1986
The Journal of the Acoustical Society of America | VOL. 79

Magnitude Estimation of Sexual Standards
John Herrmann ... Robert Fox
Perceptual and Motor Skills | VOL. 24
John Herrmann, et. al.John Herrmann ... Robert Fox
01 Feb 1967
Perceptual and Motor Skills | VOL. 24

Standing Panels Using Magnitude Estimation for Research and Product Development
Ronald S Leight ... Craig B Warren
-
Ronald S Leight, et. al.Ronald S Leight ... Craig B Warren
08 Oct 2018
08 Oct 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Benefits of Magnitude Estimation Relevance Assessments for Information Retrieval Evaluation

Abstract

Talk to us

Similar Papers