Data Quality in an Output-Agreement Game: A Comparison between Game-Generated Tags and Professional Descriptors

Rasmus Thogersen

doi:10.1007/978-3-642-41347-6_10

Abstract

A novel way to address the challenge of creating descriptive metadata for visual cultural heritage is to invite users to play Human Computation Games HCG. This study presents an investigation into tags generated by an HCG launched at The Royal Library of Denmark and compares them to descriptors assigned to the same images by professional indexers from the same institution. The analysis is done by classifying tags and descriptors by term-category and by measuring semantic overlap between the tags and the descriptors. The semantic overlap was established with thesaurus relations between a sample of tags and descriptors. The analysis shows that more than half of the validated tags had some thesaurus relation to a descriptor added by a professional indexer. Approximately 60% of the thesaurus relations were either 'same/equivalent' and roughly 20% were 'associative' and 20% 'hierarchical'. For the hierarchical thesaurus relations it was found that tags typically describe images at a less specific level than descriptors. Furthermore game-generated tags tend to describe 'artifacts/objects' and thus typically represent what is in the picture, rather than what it is about. Descriptors also primarily belonged to this term-category but also had a substantial amount of 'Proper nouns', mainly named locations. Tags generated by the game, not validated by player-agreement, had a much higher frequency of 'subjective/narrative' tags, but also more errors and a few cases of vandalism. The overall findings suggest that game-generated tags could complement existing metadata and be integrated into existing workflows.

Full Text