Abstract

The gradual migration of television from broadcast diffusion to Internet diffusion offers countless possibilities for the generation of rich navigable contents. However, it also raises numerous scientific issues regarding delinearization of TV streams and content enrichment. In this paper, we study how speech can be used at different levels of the delinearization process, using automatic speech transcription and natural language processing (NLP) for the segmentation and characterization of TV programs and for the generation of semantic hyperlinks in videos. Transcript-based video delinearization requires natural language processing techniques robust to transcription peculiarities, such as transcription errors, and to domain and genre differences. We therefore propose to modify classical NLP techniques, initially designed for regular texts, to improve their robustness in the context of TV delinearization. We demonstrate that the modified NLP techniques can efficiently handle various types of TV material and be exploited for program description, for topic segmentation, and for the generation of semantic hyperlinks between multimedia contents. We illustrate the concept of cross-media semantic navigation with a description of our news navigation demonstrator presented during the NEM Summit 2009.

Highlights

  • Television is currently undergoing a deep mutation, gradually shifting from broadcast diffusion to Internet diffusion

  • Even if one can anticipate that Internet diffusion will predominate in a near future, we firmly believe that the two diffusion modes will still coexist for long as they correspond to very different consumption habits

  • Using confidence measures for natural language processing (NLP) and information retrieval (IR) can help avoiding error-prone hard decisions from the automatic speech recognition (ASR) system and partially compensate for recognition errors, but this requires that standard NLP and IR algorithms be modified, as we propose in this paper

Read more

Summary

Introduction

Television is currently undergoing a deep mutation, gradually shifting from broadcast diffusion to Internet diffusion. We propose to adapt existing NLP and IR techniques to ASR transcripts, exploiting confidence measures and external knowledge such as semantic relations, to develop robust spoken content processing techniques at various stages of the delinearization chain. We show that this strategy is efficient in robustifying the processing of noisy ASR transcripts and permits speech-based automatic delinearization of TV streams. The proposed robust spoken document processing techniques are used for content description across a wide variety of program genres and for efficient topic segmentation of news, reports and documentaries.

Transcription of Spoken TV Contents
Using Speech As a Program Descriptor
Topic Segmentation of TV Programs
Automatically Linking Contents
Illustration
Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call