Chapter 3 - Speech Processing

Thierry Dutoit,Stéphane Dupont

doi:10.1016/b978-0-12-374825-6.00003-4

Abstract

This chapter highlights the main techniques that are available in today's automatic speech recognition (ASR) and text-to-speech (TTS) systems, with special emphasis on the concepts and on the requirements imposed by their implementation, as well as on the resulting limitations. ASR is a major component in many spoken language systems. It enables the development of useful concepts for human–machine interfaces but also for computer-mediated human-to-human communication. Statistical modeling paradigms and their extensions are key approaches to ASR. Using proper assumptions, these technologies provide a mean to factorize the different layers of the spoken language structure. Several major components hence appear. First, the speech signal is analyzed using feature extraction algorithms. The acoustic model is then used to represent the knowledge necessary to recognize individual sounds involved in speech. Words can hence be built as sequences of those individual sounds. This is represented in the pronunciation model. Finally, the language model is used to represent the knowledge regarding the grouping of words to build sentences. ASR technology has been drawing from a range of disciplines, including digital signal processing, probability, estimation and information theories, and also, naturally, from studies about the production and perception of speech, and the structure of spoken language.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Chapter 3 - Speech Processing

Abstract

Talk to us

Similar Papers

More From: Multimodal Signal Processing

Lead the way for us

Journal: Multimodal Signal Processing	Publication Date: Jan 1, 2010
Citations: 1

Similar Papers

Using Auxiliary Sources of Knowledge for Automatic Speech Recognition

-

01 Jan 2004
01 Jan 2004

Exploring recurrent neural network based acoustic and linguistic modeling for children's speech recognition
Sreeram Ganji ... Rohit Sinha
-
Sreeram Ganji, et. al.Sreeram Ganji ... Rohit Sinha
01 Nov 2017
01 Nov 2017

Conversational speech recognition
Thomas H Crystal
The Journal of the Acoustical Society of America | VOL. 102
Thomas H CrystalThomas H Crystal
01 Nov 1997
The Journal of the Acoustical Society of America | VOL. 102

Automatic long audio alignment for conversational Arabic speech
Mohamed Elmahdy
-
Mohamed ElmahdyMohamed Elmahdy
01 Jan 2013
01 Jan 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Chapter 3 - Speech Processing

Abstract

Talk to us

Similar Papers

More From: Multimodal Signal Processing