Study of Large Data Resources for Multilingual Training and System Porting

František Grézl,Ekaterina Egorova,Martin Karafiát

doi:10.1016/j.procs.2016.04.024

František Grézl, Ekaterina Egorova + Show 1 more

Open Access

https://doi.org/10.1016/j.procs.2016.04.024

Copy DOI

Journal: Procedia Computer Science	Publication Date: Jan 1, 2016
Citations: 11	License type: cc-by-nc-nd

Affiliation: Brno University of Technology

Abstract

This study investigates the behavior of a feature extraction neural network model trained on a large amount of single language data (“source language”) on a set of under-resourced target languages. The coverage of the source language acoustic space was changed in two ways: (1) by changing the amount of training data and (2) by altering the level of detail of acoustic units (by changing the triphone clustering). We observe the effect of these changes on the performance on target language in two scenarios: (1) the source-language NNs were used directly, (2) NNs were first ported to target language.The results show that increasing coverage as well as level of detail on the source language improves the target language system performance in both scenarios. For the first one, both source language characteristic have about the same effect. For the second scenario, the amount of data in source language is more important than the level of detail.The possibility to include large data into multilingual training set was also investigated. Our experiments point out possible risk of over-weighting the NNs towards the source language with large data. This degrades the performance on part of the target languages, compared to the setting where the amounts of data per language are balanced.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Study of Large Data Resources for Multilingual Training and System Porting

Abstract

Talk to us

Similar Papers

More From: Procedia Computer Science

Lead the way for us

Similar Papers

Cross-language use of acoustic information for automatic speech recognition
C Nieuwoudt ... Elizabeth C Botha
-
C Nieuwoudt, et. al.C Nieuwoudt ... Elizabeth C Botha
16 Oct 2000
16 Oct 2000

Cross-language use of acoustic information for automatic speech recognition
C Nieuwoudt ... E.C Botha
Speech Communication | VOL. 38
C Nieuwoudt, et. al.C Nieuwoudt ... E.C Botha
20 Feb 2002
Speech Communication | VOL. 38

Cross-Lingual Named Entity Recognition for Heterogenous Languages
Yingwen Fu ... Nankai Lin
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 31
Yingwen Fu, et. al.Yingwen Fu ... Nankai Lin
01 Jan 2023
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 31

Crosslingual Transfer Learning for Relation and Event Extraction via Word Category and Class Alignments
Minh Van Nguyen ... Thien Huu Nguyen
-
Minh Van Nguyen, et. al.Minh Van Nguyen ... Thien Huu Nguyen
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Study of Large Data Resources for Multilingual Training and System Porting

Abstract

Talk to us

Similar Papers

More From: Procedia Computer Science