Abstract

Image search engines differ significantly from general web search engines in the way of presenting search results. The difference leads to different interaction and examination behavior patterns, and therefore requires changes in evaluation methodologies. However, evaluation of image search still utilizes the methods for general web search. In particular, offline metrics are calculated based on coarse-fine topical relevance judgments with the assumption that users examine results in a sequential manner. In this article, we investigate annotation methods via crowdsourcing for image search evaluation based on a lab-based user study. Using user satisfaction as the golden standard, we make several interesting findings. First, instead of item-based annotation, annotating relevance in a row-based way is more efficient without hurting performance. Second, besides topical relevance, image quality plays a crucial role when evaluating the image search results, and the importance of image quality changes with search intent. Third, compared to traditional four-level scales, the fine-grain annotation method outperforms significantly. To our best knowledge, our work is the first to systematically study how diverse factors in data annotation impact image search evaluation. Our results suggest different strategies for exploiting the crowdsourcing to get data annotated under different conditions.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.