Three-Stage Framework for Unsupervised Acoustic Modeling Using Untranscribed Spoken Content

Andrej Zgank

doi:10.4218/etrij.10.1510.0092

Andrej Zgank

Open Access

PDF Available

https://doi.org/10.4218/etrij.10.1510.0092

Copy DOI

Export

Save

Cite

Journal: ETRI Journal	Publication Date: Oct 6, 2010
Citations: 3

Affiliation: University of Maribor

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

This paper presents a new framework for integrating untranscribed spoken content into the acoustic training of an automatic speech recognition system. Untranscribed spoken content plays a very important role for under-resourced languages because the production of manually transcribed speech databases still represents a very expensive and time-consuming task. We proposed two new methods as part of the training framework. The first method focuses on combining initial acoustic models using a data-driven metric. The second method proposes an improved acoustic training procedure based on unsupervised transcriptions, in which word endings were modified by broad phonetic classes. The training framework was applied to baseline acoustic models using untranscribed spoken content from parliamentary debates. We include three types of acoustic models in the evaluation: baseline, reference content, and framework content models. The best overall result of 18.02% word error rate was achieved with the third type. This result demonstrates statistically significant improvement over the baseline and reference acoustic models.

Full Text