Abstract

This chapter introduces the concept of multilingual acoustic modeling in automatic speech recognition. The aim of automatic speech recognition (ASR)system is to convert speech into a sequence of written words. A typical system consists of four major components: the acoustic model, which models the sound units of a language based on speech features extracted from the speech signal; the pronunciation dictionary, which usually describes the pronunciation of words as a concatenation of the modeled sound units; the language model, which estimates the probability of word sequences; and the decoder, which efficiently searches the huge number of possible word sequences and selects the most likely word sequence. This chapter describes sound inventories that are suitable as basic units for multilingual acoustic models, investigates techniques and algorithms to develop these models, and gives examples of applications of multilingual acoustic models in speech recognition, with special emphasis on the use of these models for rapid porting to new languages.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call