Duration normalization for improved automatic speech recognition

Jon P Nedel,Richard M Stern

doi:10.1121/1.4779359

Abstract

While hidden Markov models (HMMs) serve as the basic acoustic modeling framework for many automatic speech recognition systems, they are known to model the duration of sound units poorly. Phone duration normalization can be accomplished by adding and reconstructing missing frames when a phone is shorter than the desired duration, and by deleting frames when a phone is longer than the desired duration. If phone segmentations are known a priori, this technique achieves relative reductions in word error rate (WER) of up to 35%, confirming the conjecture that speech with normalized phone durations may be modeled better and discriminated more accurately using standard HMM acoustic models. Unfortunately, duration normalization using imperfect automatically generated phone segmentations has not yielded significant recognition improvements. A modification of the duration normalization approach has been developed. Three different feature streams are generated for each utterance using various combinations of expansion and contraction of hypothesized phone segments. Each stream is recognized using an acoustic model trained for that stream. While the resulting recognition hypotheses themselves are not significantly better than baseline, these hypotheses can be automatically combined to produce relative improvements in WER of up to 7.7% over several speech databases. [Work supported by DARPA and Telefónica.]

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Duration normalization for improved automatic speech recognition

Abstract

Talk to us

Similar Papers

More From: The Journal of the Acoustical Society of America

Lead the way for us

Similar Papers

Theoretical Analysis of Diversity in an Ensemble of Automatic Speech Recognition Systems
Kartik Audhkhasi ... Shrikanth S Narayanan
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22
Kartik Audhkhasi, et. al.Kartik Audhkhasi ... Shrikanth S Narayanan
01 Mar 2014
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22

Retrospective Analysis of Clinical Performance of an Estonian Speech Recognition System for Radiology: Effects of Different Acoustic and Language Models
A. Paats ... I. Fridolin
Journal of Digital Imaging | VOL. 31
A. Paats, et. al.A. Paats ... I. Fridolin
30 Apr 2018
Journal of Digital Imaging | VOL. 31

An Investigation of Multilingual TDNN-BLSTM Acoustic Modeling for Hindi Speech Recognition
Ankit Kumar ... Rajesh Kumar Aggarwal
International Journal of Sensors, Wireless Communications and Control | VOL. 12
Ankit Kumar, et. al.Ankit Kumar ... Rajesh Kumar Aggarwal
01 Jan 2021
International Journal of Sensors, Wireless Communications and Control | VOL. 12

Using different acoustic, lexical and language modeling units for ASR of an under-resourced language – Amharic
Martha Yifiru Tachbelie ... Laurent Besacier
Speech Communication | VOL. 56
Martha Yifiru Tachbelie, et. al.Martha Yifiru Tachbelie ... Laurent Besacier
14 Feb 2013
Speech Communication | VOL. 56

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Duration normalization for improved automatic speech recognition

Abstract

Talk to us

Similar Papers

More From: The Journal of the Acoustical Society of America