Abstract

This chapter addresses major issues of language modeling in the context of multilingualism, such as the portability of existing language modeling techniques to less investigated languages, issues of morphological complexity and word segmentation, and the feasibility of cross-linguistic language modeling. A language model is a probability assignment over all possible word sequences in a natural language. Its goal is to assign relatively large probability to meaningful, grammatical, or frequent word sequences compared to rare, ungrammatical, or nonsensical ones. This chapter has adopted a markedly statistical notion of language modeling. Issues in language modeling involve deciding what constitutes a word, how the words in an utterance are statistically dependent, and how to estimate the parameters of this statistical dependence from data. Considerations of multilingual processing arise in two qualitatively different settings, leading to two related but different avenues of research. The first setting is the development of speech and language technology in a new language. The second setting that has multilingual implications is a multilingual system, in which a single recognition engine must accept spoken input and produce spoken output in multiple languages.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call