Abstract

In this paper we report on a two-stage evaluation of unsupervised labeling of audiovisual content using collateral text data sources to investigate how such an approach can provide acceptable results for given requirements with respect to archival quality, authority and service levels to external users. We conclude that with parameter settings that are optimized using a rigorous evaluation of precision and accuracy, the quality of automatic term-suggestion is sufficiently high. We furthermore provide an analysis of the term extraction after being taken into production, where we focus on performance variation with respect to term types and television programs. Having implemented the procedure in our production work-flow allows us to gradually develop the system further and to also assess the effect of the transformation from manual to automatic annotation from an end-user perspective. Additional future work will be on deploying different information sources including annotations based on multimodal video analysis such as speaker recognition and computer vision.

Highlights

  • Audiovisual content in digital libraries is being labeled manually, typically using controlled and structured vocabularies or domain specific thesauri

  • One source of mismatches can occur in the Named Entity Recognizer module

  • In this paper we reported on the two-stage evaluation of automatic labeling of audiovisual content in an archive production environment

Read more

Summary

Introduction

Audiovisual content in digital libraries is being labeled manually, typically using controlled and structured vocabularies or domain specific thesauri. This is not a sustainable model given (1) the increasing amounts of audiovisual content that digital libraries ingest (quantitative perspective), and (2) a growing emphasis on improving access opportunities for these data (qualitative perspective). The latter is addressed in the context of traditional search, but increasingly in the context of linking within and across collections, libraries, and media. Search and linking is shifting from a document-level perspective towards a segment-level perspective in which segments are regarded as individual, ‘linkable’ media-objects In this context, the traditional, manual labeling process requires revision to increase both quantity and quality of labels. The proposed term suggestion methods were evaluated in terms of Preci-

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call