Abstract

In automatic speech recognition for development of automatic speech recognition applications, there has been numerous claims on the presence of speech recognition errors known as classified into lexical and acoustic errors. These errors distort speech signals thereby depreciating the accuracy and performance rate of speech recognition applications. Even though lexical speech recognition error problem has been partially combated, acoustic speech recognition error referred to as user’s acoustic irrational behavior is being ignored causing high error rate with low accuracy which is the bone of contention and an impediment factor in the wide adoption of speech recognition technology. Users do not always behave in a rational manner especially when dealing with a particular speech recognition application. The persistent presence of these user’s acoustic irrational behavior in speech have intensified the essential need to automatically detect and correct such errors, as current researches only focus on detecting user’s acoustic irrational behavior but not correcting/reformulating/re-sizing this error. Hence, this paper provides an acoustic nudging model that will perform automatic correction/reformulation of user’s acoustic irrational behavior in speech to achieve higher performance and accuracy using different acoustic parameters which are based in Pitch, Time gaps between words, Timbre descend and ascend time and Loudness. This study was able to discover a foundation for reducing error rate and achieve higher performance, as well as improve accuracy in speech recognition applications through detection and re-formulation of user’s acoustic irrational behavior in speech signal automatically, thereby making the model applicable to any speech recognition applications. The outcome of this study would be useful in enhancing accuracy and performance in the context of automatic speech recognition.

Highlights

  • Speech variations are either intrinsic or extrinsic variations causing Automatic Speech Recognition (ASR) error (Benzeghiba et al, 2006)

  • The term re-formulation in the context of this study means automatically re-adjusting and re-sizing of speaker related errors i.e., user’s acoustic irrational behavior during speech communication. This is achieved through re-formulation of the speech parameters such as Pitch, Loudness, Timbre and Time Gaps between words measured in S, seconds that makes up human acoustic behavior through Acoustic Nudging Model

  • The acoustic nudging modeling technique was applied on 8 speech samples from the training dataset using 8 speakers which comprises of two “2” male adult, two “2” female adult, two “2” male child and two “2” female child

Read more

Summary

Introduction

Speech variations are either intrinsic or extrinsic variations causing Automatic Speech Recognition (ASR) error (Benzeghiba et al, 2006). Voice changes due to aging, illness and emotional state (angry, frustrated, joyful, sadness, tiredness, laughing, pride, guilt, relief, etc.), repetition, interruptions, channel mismatch (mismatch in recording conditions between the training and the testing speech data are the main challenges of speech recognition) All these factors corrupt the original queries given by speakers which leads to ASR errors and distortions (Jiang et al, 2013; Errattahi et al, 2018; Tang et al, 2019). The term re-formulation in the context of this study means automatically re-adjusting and re-sizing of speaker related errors i.e., user’s acoustic irrational behavior during speech communication This is achieved through re-formulation of the speech parameters such as Pitch, Loudness, Timbre (ascend and descend time) and Time Gaps between words measured in S, seconds that makes up human acoustic behavior through Acoustic Nudging Model.

Related Work
DESIGN PHASE
X: User’s acoustic rational behavior m
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call