Story Boundary Detection Research Articles

This paper investigates speech prosody for automatic story segmentation in Mandarin broadcast news. Prosodic cues effectively used in English story segmentation deserve a re-investigation since the lexical tones of Mandarin may complicate the expressions of pitch declination and reset. Our data-oriented study shows that story boundaries cannot be clearly discriminated from utterance boundaries by speaker normalized pitch features due to their large variations across different Mandarin syllable tones. We thus propose to use speaker- and tone-normalized pitch features that can provide clear separations between utterance and story boundaries. Our study also shows that speaker-normalized pause duration is quite effective to separate between story and utterance boundaries, while speaker-normalized speech energy and syllable duration are not effective. Experiments using decision trees for story boundary detection reinforce the difference between English and Chinese, i.e., speaker- and tone-normalized pitch features should be favorably adopted in Mandarin story segmentation. We show that the combination of different prosodic cues can achieve a very high F-measure of 93.04% due to the complementarity between pause, pitch and energy. Analysis of the decision tree uncovered five major heuristics that show how speakers jointly utilize pause duration and pitch to separate speech into stories.

Read full abstract

This paper describes an indexing system that automatically creates metadata for multimedia broadcast news content by integrating audio, speech, and visual information. The automatic multimedia content indexing system includes acoustic segmentation (AS), automatic speech recognition (ASR), topic segmentation (TS), and video indexing features. The new spectral-based features and smoothing method in the AS module improved the speech detection performance from the audio stream of the input news content. In the speech recognition module, automatic selection of acoustic models achieved both a low WER, as with parallel recognition using multiple acoustic models, and fast recognition, as with the single acoustic model. The TS method using word concept vectors achieved more accurate results than the conventional method using local word frequency vectors. The information integration module provides the functionality of integrating results from the AS module, TS module, and SC module. The story boundary detection accuracy was improved by combining it with the AS results and the SC results compared to the sole TS results

Read full abstract

Story Boundary Detection Research Articles

Related Topics

Articles published on Story Boundary Detection

Automatic Story Segmentation for TV News Video Using Multiple Modalities

Discovering salient prosodic cues and their interactions for automatic story segmentation in Mandarin broadcast news

A Cascaded Broadcast News Highlighter

Automatic multimedia indexing: combining audio, speech, and visual information to index broadcast news

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Story Boundary Detection Research Articles

Related Topics

Articles published on Story Boundary Detection

Automatic Story Segmentation for TV News Video Using Multiple Modalities

Discovering salient prosodic cues and their interactions for automatic story segmentation in Mandarin broadcast news

A Cascaded Broadcast News Highlighter

Automatic multimedia indexing: combining audio, speech, and visual information to index broadcast news