Experiments for the Selection of Sub-Word Units in the Basque Context

Nora Barroso,Manuel Graña,Carmen Hernandez,Karmele López De Ipiña

doi:10.1007/978-3-642-19644-7_52

Abstract

AbstractThe development of Multilingual Automatic Speech Recognition (ASR) systems involves Acoustic Phonetic Decoding, Language Modeling, Language Identification and the development of appropriated Language Resources. Only a small number of languages possess the resources required for these developments, the remaining languages are under-resourced. In this paper we explore robust strategies of Soft Computing in the selection of sub-word units oriented to under-resourced languages for ASR in the Basque context. Three languages are analyzed: French, Spanish and the minority one, Basque language. The proposed methodology is based on approaches of Discriminant and Principal Components Analysis, robust covariance matrix estimation methods, Support Vector Machines (SVM), Hidden Markov Models (HMMs) and cross-lingual strategies. New methods improve considerably the accuracy rate obtained on incomplete, small sample sets, providing an excellent tool to manage these kinds of languages.KeywordsUnder-resourced languagessub-word unitsMultilingual Automatic Speech RecognitionDiscriminant AnalysisMatrix Covariance Estimation Methods

Full Text