Chapter 5 - Multilingual Dictionaries

Martine Adda-Decker,Lori Lamel

doi:10.1016/b978-012088501-5/50008-1

Abstract

This chapter focuses on multilingual dictionaries for use in automatic speech recognition. It provides an overview of dictionary modeling and generation issues in the context of multilingual speech processing. For most automatic speech recognition systems, multilingual pronunciation dictionaries are still collections of monolingual dictionaries. For different languages, the proportion of imported words—that is, words shared with other languages—increases with vocabulary size. This chapter addresses the various steps in lexical development, including the normalization, choice of word items, the selection of a word list, and pronunciation generation. Tokenization and normalization were first addressed in the context of written sources, which often form the basis of language modeling material. Suitable units for dictionary modeling are discussed in the light of similarities and dissimilarities between languages. Dictionary generation techniques are illustrated, along with the pros and cons of automatic or semiautomatic procedures. In this chapter, only languages for which written resources are available are considered.

Full Text