The Usefulness of Imperfect Speech Data for ASR Development in Low-Resource Languages

Jaco Badenhorst,Febe De Wet

doi:10.3390/info10090268

Jaco Badenhorst, Febe De Wet

Open Access

PDF Available

https://doi.org/10.3390/info10090268

Copy DOI

Export

Save

Cite

Journal: Information	Publication Date: Aug 28, 2019
Citations: 6	License type: CC BY 4.0

Affiliation: Stellenbosch University

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

When the National Centre for Human Language Technology (NCHLT) Speech corpus was released, it created various opportunities for speech technology development in the 11 official, but critically under-resourced, languages of South Africa. Since then, the substantial improvements in acoustic modeling that deep architectures achieved for well-resourced languages ushered in a new data requirement: their development requires hundreds of hours of speech. A suitable strategy for the enlargement of speech resources for the South African languages is therefore required. The first possibility was to look for data that has already been collected but has not been included in an existing corpus. Additional data was collected during the NCHLT project that was not included in the official corpus: it only contains a curated, but limited subset of the data. In this paper, we first analyze the additional resources that could be harvested from the auxiliary NCHLT data. We also measure the effect of this data on acoustic modeling. The analysis incorporates recent factorized time-delay neural networks (TDNN-F). These models significantly reduce phone error rates for all languages. In addition, data augmentation and cross-corpus validation experiments for a number of the datasets illustrate the utility of the auxiliary NCHLT data.

Highlights

The development of language and speech technology requires substantial amounts of appropriate data
We report results obtained using time delay neural networks (TDNN)-F acoustic models, which have recently been demonstrated to be effective in resource-constrained scenarios [32]
This selection strategy resulted in some improvement given the TDNN-bi-directional LSTMs (BLSTMs) baseline

Summary

Introduction

The development of language and speech technology requires substantial amounts of appropriate data. Various strategies have been proposed to collect speech and text resources for technology development, for example harvesting existing data like broadcast news and online publications, crowd-sourcing, web crawling, dedicated data collection campaigns, etcetera [7,8,9,10,11,12,13]. Both data types are required for language and speech technology development, and constructing comprehensive text corpora is just as important as creating speech resources. During the Lwazi project, telephone speech was collected (between four and ten hours per language [2]), while the aim of the first NCHLT project was to collect 50–60 h of orthographically-transcribed, broadband speech in each of the country’s 11 official languages [4]

Background

Unique and Repeated Prompts

Speaker Mapping

Phone Representations

Experiments

Acoustic Modeling

Phone Recognition Measurement

Baseline Systems

Acoustic Ranking

Data Selection

Cross-Corpus Validation

Discussion

Findings

Conclusions

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

The Usefulness of Imperfect Speech Data for ASR Development in Low-Resource Languages

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

A New Corpus of Elderly Japanese Speech for Acoustic Modeling, and a Preliminary Investigation of Dialect-Dependent Speech Recognition
Meiko Fukuda ... Ryota Nishimura
-
Meiko Fukuda, et. al.Meiko Fukuda ... Ryota Nishimura
01 Oct 2019
01 Oct 2019

Out Domain Data Augmentation on Punjabi Children Speech Recognition using Tacotron
Taniya Hasija ... Kalpna Guleria
Journal of Physics: Conference Series | VOL. 1950
Taniya Hasija, et. al.Taniya Hasija ... Kalpna Guleria
01 Aug 2021
Journal of Physics: Conference Series | VOL. 1950

An Investigation of Multilingual TDNN-BLSTM Acoustic Modeling for Hindi Speech Recognition
Ankit Kumar ... Rajesh Kumar Aggarwal
International Journal of Sensors, Wireless Communications and Control | VOL. 12
Ankit Kumar, et. al.Ankit Kumar ... Rajesh Kumar Aggarwal
01 Jan 2021
International Journal of Sensors, Wireless Communications and Control | VOL. 12

Efficient multi-lingual unsupervised acoustic model training under mismatch conditions
Masahiro Saiko ... Chiori Hori
-
Masahiro Saiko, et. al.Masahiro Saiko ... Chiori Hori
01 Dec 2014
01 Dec 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

The Usefulness of Imperfect Speech Data for ASR Development in Low-Resource Languages

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Information