Capitalising on North American speech resources for the development of a South African English large vocabulary speech recognition system

Herman Kamper,Febe De Wet,Thomas Hain,Thomas Niesler

doi:10.1016/j.csl.2014.04.005

Abstract

South African English is currently considered an under-resourced variety of English. Extensive speech resources are, however, available for North American (US) English. In this paper we consider the use of these US resources in the development of a South African large vocabulary speech recognition system. Specifically we consider two research questions. Firstly, we determine the performance penalties that are incurred when using US instead of South African language models, pronunciation dictionaries and acoustic models. Secondly, we determine whether US acoustic and language modelling data can be used in addition to the much more limited South African resources to improve speech recognition performance. In the first case we find that using a US pronunciation dictionary or a US language model in a South African system results in fairly small penalties. However, a substantial penalty is incurred when using a US acoustic model. In the second investigation we find that small but consistent improvements over a baseline South African system can be obtained by the additional use of US acoustic data. Larger improvements are obtained when complementing the South African language modelling data with US and/or UK material. We conclude that, when developing resources for an under-resourced variety of English, the compilation of acoustic data should be prioritised, language modelling data has a weaker effect on performance and the pronunciation dictionary the smallest.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Capitalising on North American speech resources for the development of a South African English large vocabulary speech recognition system

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Journal: Computer Speech & Language	Publication Date: May 6, 2014
Citations: 35

Similar Papers

Development of a spontaneous large vocabulary speech recognition system for Qatari Arabic
Mohamed Elmahdy
-
Mohamed ElmahdyMohamed Elmahdy
01 Jan 2013
01 Jan 2013

Exploring recurrent neural network based acoustic and linguistic modeling for children's speech recognition
Sreeram Ganji ... Rohit Sinha
-
Sreeram Ganji, et. al.Sreeram Ganji ... Rohit Sinha
01 Nov 2017
01 Nov 2017

Modelo Acústico y de Lenguaje del Idioma Español para el dialecto Cucuteño, Orientado al Reconocimiento Automático del Habla
Juan David Celis Nuñez ... Rodrigo Andres Llanos Castro
Ingeniería | VOL. 22
Juan David Celis Nuñez, et. al.Juan David Celis Nuñez ... Rodrigo Andres Llanos Castro
12 Sep 2017
Ingeniería | VOL. 22

A Comparative Study on Selecting Acoustic Modeling Units for WFST-based Mongolian Speech Recognition
Wang Yonghe ... Feilong Bao
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 22
Wang Yonghe, et. al.Wang Yonghe ... Feilong Bao
13 Oct 2023
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Capitalising on North American speech resources for the development of a South African English large vocabulary speech recognition system

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language