Semantic speech recognition in the Basque context Part II: language identification for under-resourced languages

Nora Barroso,Aitzol Ezeiza,Karmele López De Ipiña,Manuel Graña,Carmen Hernández

doi:10.1007/s10772-011-9114-4

Abstract

This paper describes the development of a Language Identification (LID) system oriented to robust Multilingual Speech Recognition in the Basque context where coexist three languages: Basque, Spanish and French. The LID system is integrated in GorUP, a Semantic Speech Recognition system for industrial complex environments described in Part I. The work presents hybrid strategies for LID, based on the selection of system elements by several classifiers (Support Vector Machines and Multilayer Perceptron) and Discriminant Analysis improved with robust regularized covariance matrix estimation methods oriented to under-resourced languages and stochastic methods for speech recognition tasks (Hidden Markov Models and n-grams). The LID tool manages the main elements of the Automatic Speech Recognition system (Acoustic Phonetic Decoder, Language Model and Lexicons).

Full Text