Exploiting contextual information for prosodic event detection using auto-context

Junhong Zhao,Wei-Qiang Zhang,Jia Liu,Hua Yuan,Shanhong Xia,Michael T Johnson

doi:10.1186/1687-4722-2013-30

Abstract

Prosody and prosodic boundaries carry significant information regarding linguistics and paralinguistics and are important aspects of speech. In the field of prosodic event detection, many local acoustic features have been investigated; however, contextual information has not yet been thoroughly exploited. The most difficult aspect of this lies in learning the long-distance contextual dependencies effectively and efficiently. To address this problem, we introduce the use of an algorithm called auto-context. In this algorithm, a classifier is first trained based on a set of local acoustic features, after which the generated probabilities are used along with the local features as contextual information to train new classifiers. By iteratively using updated probabilities as the contextual information, the algorithm can accurately model contextual dependencies and improve classification ability. The advantages of this method include its flexible structure and the ability of capturing contextual relationships. When using the auto-context algorithm based on support vector machine, we can improve the detection accuracy by about 3% and F-score by more than 7% on both two-way and four-way pitch accent detections in combination with the acoustic context. For boundary detection, the accuracy improvement is about 1% and the F-score improvement reaches 12%. The new algorithm outperforms conditional random fields, especially on boundary detection in terms of F-score. It also outperforms an n-gram language model on the task of pitch accent detection.

Highlights

Speech is often characterized across two levels of expression: the segmental level encompassing basic phonetic meaning and the prosodic level with additional suprasegmental information
We investigate the utilization of contextual information for pitch accent and boundary detection by using the auto-context algorithm, which was first proposed in [3] for high-level computer vision tasks like image segmentation
For the n-gram approach, we referred to the results of the representative work [8], in which the same two-way pitch accent detection and binary boundary detection are implemented on the Boston University Radio Speech Corpus (BURSC) dataset using the syllable-level acoustic features of F0, timing cues, and energy

Summary

Introduction

Speech is often characterized across two levels of expression: the segmental level encompassing basic phonetic meaning and the prosodic level with additional suprasegmental information. We investigate the utilization of contextual information for pitch accent and boundary detection by using the auto-context algorithm, which was first proposed in [3] for high-level computer vision tasks like image segmentation In this algorithm, the classification probabilities obtained from the preceding iteration are used to provide possible contextual clues, together with acoustic features to improve the iteration. Using the posterior probabilities provided by the decision tree, a bigram prosodic label sequence model was combined to detect pitch accent and boundary tones at the syllable level. To investigate the importance of contextual information in prosodic event detection, the work in [16] examined the detection performance of pitch accent at word, syllable, and vowel levels, respectively.

Acoustic representation

Findings

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Audio, Speech, and Music Processing	Publication Date: Dec 1, 2013
Citations: 26	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Exploiting contextual information for prosodic event detection using auto-context

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing

Lead the way for us

Similar Papers

Automatic pitch accent detection using auto-context with acoustic features
Junhong Zhao ... Hua Yuan
-
Junhong Zhao, et. al.Junhong Zhao ... Hua Yuan
01 Dec 2012
01 Dec 2012

Combining Local and Global Image Features for Object Class Recognition
D.A Lisin ... E.G Learned-Miller
-
D.A Lisin, et. al.D.A Lisin ... E.G Learned-Miller
01 Jan 2004
01 Jan 2004

A supervised manipuri offline signature verification system with global and local features
Teressa Longjam ... Dakshina Ranjan Kisku
-
Teressa Longjam, et. al.Teressa Longjam ... Dakshina Ranjan Kisku
01 Dec 2017
01 Dec 2017

Multifeature Fusion-Based Object Detection for Intelligent Transportation Systems
Shuo Yang ... Jianru Li
IEEE Transactions on Intelligent Transportation Systems | VOL. 24
Shuo Yang, et. al.Shuo Yang ... Jianru Li
01 Jan 2023
IEEE Transactions on Intelligent Transportation Systems | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploiting contextual information for prosodic event detection using auto-context

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing