Language Identification for Under-Resourced Languages in the Basque Context

Nora Barroso,Manuel Graña,Karmele López De Ipiña,Aitzol Ezeiza

doi:10.1007/978-3-642-19644-7_50

Abstract

AbstractAutomatic Speech Recognition (ASR) is a broad research area that absorbs many efforts from the research community. The interest on Multilingual Systems arouses in the Basque Country because there are three official languages (Basque, Spanish, and French), and there is much linguistic interaction among them, even if Basque has very different roots than the other two languages. The development of Multilingual Large Vocabulary Continuous Speech Recognition systems involves issues as: Language Identification, Acoustic Phonetic Decoding, Language Modeling or the development of appropriate Language Resources. This paper describes the development of a Language Identification (LID) system oriented to robust Multilingual Speech Recognition in the Basque context. The work presents hybrid strategies for LID, based on the selection of system elements by several classifiers and Discriminant Analysis improved with robust regularized covariance matrix estimation methods oriented to under-resourced languages and stochastic methods for speech recognition tasks (Hidden Markov Models and n-grams).KeywordsLanguage IdentificationUnder Resourced LanguagesDiscriminant AnalysisCovariance Matrix Estimation Methods

Full Text