Abstract

Use of mixed language in day to day spoken speech is becoming common and is accepted as being syntactically correct. However machine recognition of mixed language spoken speech is a challenge to a conventional speech recognition engine. There are studies on how to enable recognition of mixed language speech. At one end of the spectra is to use acoustic models of the complete phone set of the mixed language to enable recognition while on the other end of the spectra is to use a language identification module followed by language dependent speech recognition engines to do the recognition. Each of this has its own implications. In this paper, we approach the problem of mixed language speech recognition by using available resources and show that by suitably constructing an appropriate pronunciation dictionary and modifying the language model to use mixed language, one can achieve a good recognition accuracy of spoken mixed language.

Highlights

  • Existing ApproachesRecognition of mixed language speech is still in its init ial stages of research. There are two approaches reported in literature

  • Introduction entirely different approachMixed langu ag e, als o termed as cod e s witch ing in literature, arises through the fusion of two or more, usually distinct, mixed source languages, normally in situations of thorough bilingualis m, so that it is not possible to classify the resulting language as belonging to either of the language families that were its source[17],[1],[2]

  • It is similar to a language specific automatic speech recognition (ASR), except that the acoustic model (AM), language model (LM) and pronunciation lexicon (PL) are built for the mixed language

Read more

Summary

Existing Approaches

Recognition of mixed language speech is still in its init ial stages of research. There are two approaches reported in literature. It is similar to a language specific ASR, except that the AM, LM and PL are built for the mixed language Note that this approach needs mixed language speech and text corpus, wh ich generally is not available. We used the one pass framework we used the AM of a single language (which was readily available) instead of trying to undertake the Herculean task of collect ing speech corpus and transcribing it to build AM for the co mplete phone set which encompasses both the languages. The reason for using these AM instead of AM for mixed language was (a) these AMs were readily available for use and (b) building acoustic models for mixed language was too cumbersome requiring actual on the field collection of a large amount of speech corpus to which we did not have access. All the Hindi words are first transliterated into English and the pronunciation of this English word is obtained using[14] or approximate phoneme mapping (APM)

Proposed Approach
Results and Discussion
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call