Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes

Bernd T Meyer,Birger Kollmeier,Thomas Brand

doi:10.1121/1.3514525

Abstract

The aim of this study is to quantify the gap between the recognition performance of human listeners and an automatic speech recognition (ASR) system with special focus on intrinsic variations of speech, such as speaking rate and effort, altered pitch, and the presence of dialect and accent. Second, it is investigated if the most common ASR features contain all information required to recognize speech in noisy environments by using resynthesized ASR features in listening experiments. For the phoneme recognition task, the ASR system achieved the human performance level only when the signal-to-noise ratio (SNR) was increased by 15 dB, which is an estimate for the human-machine gap in terms of the SNR. The major part of this gap is attributed to the feature extraction stage, since human listeners achieve comparable recognition scores when the SNR difference between unaltered and resynthesized utterances is 10 dB. Intrinsic variabilities result in strong increases of error rates, both in human speech recognition (HSR) and ASR (with a relative increase of up to 120%). An analysis of phoneme duration and recognition rates indicates that human listeners are better able to identify temporal cues than the machine at low SNRs, which suggests incorporating information about the temporal dynamics of speech into ASR systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes

Abstract

Talk to us

Similar Papers

More From: The Journal of the Acoustical Society of America

Lead the way for us

Journal: The Journal of the Acoustical Society of America	Publication Date: Jan 1, 2011
Citations: 38

Similar Papers

Early Decision Making in Continuous Speech
Odette Scharenborg ... Lou Boves
-
Odette Scharenborg, et. al.Odette Scharenborg ... Lou Boves
01 Jun 2007
01 Jun 2007

Automatic and human speech recognition in null grammar
Amit Juneja
The Journal of the Acoustical Society of America | VOL. 130
Amit JunejaAmit Juneja
01 Oct 2011
The Journal of the Acoustical Society of America | VOL. 130

Performance Analysis of various Front-end and Back End Amalgamations for Noise-robust DNN-based ASR
Mohit Dua ... Vinam Agrawal
Recent Advances in Computer Science and Communications | VOL. 14
Mohit Dua, et. al.Mohit Dua ... Vinam Agrawal
01 Dec 2021
Recent Advances in Computer Science and Communications | VOL. 14

Enhanced Robot Speech Recognition Using Biomimetic Binaural Sound Source Localization.
Jorge Davila-Chacon ... Jindong Liu
IEEE Transactions on Neural Networks and Learning Systems | VOL. 30
Jorge Davila-Chacon, et. al.Jorge Davila-Chacon ... Jindong Liu
04 Jun 2018
IEEE Transactions on Neural Networks and Learning Systems | VOL. 30

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes

Abstract

Talk to us

Similar Papers

More From: The Journal of the Acoustical Society of America