Interacting with computers by voice: automatic speech recognition and synthesis

D O'Shaughnessy

doi:10.1109/jproc.2003.817117

Abstract

This paper examines how people communicate with computers using speech. Automatic speech recognition (ASR) transforms speech into text, while automatic speech synthesis [or text-to-speech (TTS)] performs the reverse task. ASR has been largely developed based on speech coding theory, while simulating certain spectral analyses performed by the ear. Typically, a Fourier transform is employed, but following the auditory Bark scale and simplifying the spectral representation with a decorrelation into cepstral coefficients. Current ASR provides good accuracy and performance on limited practical tasks, but exploits only the most rudimentary knowledge about human production and perception phenomena. The popular mathematical model called the hidden Markov model (HMM) is examined; first-order HMMs are efficient but ignore long-range correlations in actual speech. Common language models use a time window of three successive words in their syntactic-semantic analysis. Speech synthesis is the automatic generation of a speech waveform, typically from an input text. As with ASR, TTS starts from a database of information previously established by analysis of much training data, both speech and text. Previously analyzed speech is stored in small units in the database, for concatenation in the proper sequence at runtime. TTS systems first perform text processing, including letter-to-sound conversion, to generate the phonetic transcription. Intonation must be properly specified to approximate the naturalness of human speech. Modern synthesizers using large databases of stored spectral patterns or waveforms output highly intelligible synthetic speech, but naturalness remains to be improved.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Interacting with computers by voice: automatic speech recognition and synthesis

Abstract

Talk to us

Similar Papers

More From: Proceedings of the IEEE

Lead the way for us

Journal: Proceedings of the IEEE	Publication Date: Sep 1, 2003
Citations: 110

Similar Papers

A Bilingual Kazakh-Russian System for Automatic Speech Recognition and Synthesis
Olga Khomitsevich ... Valentin Mendelev
-
Olga Khomitsevich, et. al.Olga Khomitsevich ... Valentin Mendelev
01 Jan 2015
01 Jan 2015

New Systems and Architectures for Automatic Speech Recognition and Synthesis
Ching Y Suen
-
Ching Y SuenChing Y Suen
01 Jan 1985
01 Jan 1985

Double ended speech enabled system in Indian travel & tourism industry
Sanghamitra Mohanty ... Basanta Kumar Swain
-
Sanghamitra Mohanty, et. al.Sanghamitra Mohanty ... Basanta Kumar Swain
01 Dec 2013
01 Dec 2013

A Multi-Space Distribution (MSD) and two-stream tone modeling approach to Mandarin speech recognition
Yao Qian ... Frank K Soong
Speech Communication | VOL. 51
Yao Qian, et. al.Yao Qian ... Frank K Soong
15 Aug 2009
Speech Communication | VOL. 51

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Interacting with computers by voice: automatic speech recognition and synthesis

Abstract

Talk to us

Similar Papers

More From: Proceedings of the IEEE