Advances in subword-based HMM-DNN speech recognition across languages

Peter Smit,Sami Virpioja,Mikko Kurimo

doi:10.1016/j.csl.2020.101158

Peter Smit, Sami Virpioja + Show 1 more

Open Access

https://doi.org/10.1016/j.csl.2020.101158

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

We describe a novel way to implement subword language models in speech recognition systems based on weighted finite state transducers, hidden Markov models, and deep neural networks. The acoustic models are built on graphemes in a way that no pronunciation dictionaries are needed, and they can be used together with any type of subword language model, including character models. The advantages of short subword units are good lexical coverage, reduced data sparsity, and avoiding vocabulary mismatches in adaptation. Moreover, constructing neural network language models (NNLMs) is more practical, because the input and output layers are small. We also propose methods for combining the benefits of different types of language model units by reconstructing and combining the recognition lattices. We present an extensive evaluation of various subword units on speech datasets of four languages: Finnish, Swedish, Arabic, and English. The results show that the benefits of short subwords are even more consistent with NNLMs than with traditional n-gram language models. Combination across different acoustic models and language models with various units improve the results further. For all the four datasets we obtain the best results published so far. Our approach performs well even for English, where the phoneme-based acoustic models and word-based language models typically dominate: The phoneme-based baseline performance can be reached and improved by 4% using graphemes only when several grapheme-based models are combined. Furthermore, combining both grapheme and phoneme models yields the state-of-the-art error rate of 15.9% for the MGB 2018 dev17b test. For all four languages we also show that the language models perform reasonably well when only limited training data is available.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computer Speech & Language	Publication Date: Sep 28, 2020
Citations: 30	License type: cc-by

R Discovery Prime

Advances in subword-based HMM-DNN speech recognition across languages

Abstract

Published Version

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Similar Papers

Empirically combining unnormalized NNLM and back-off N-gram for fast N-best rescoring in speech recognition
Yongzhe Shi ... Wei-Qiang Zhang
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2014
Yongzhe Shi, et. al.Yongzhe Shi ... Wei-Qiang Zhang
28 Apr 2014
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2014

Modelo Acústico y de Lenguaje del Idioma Español para el dialecto Cucuteño, Orientado al Reconocimiento Automático del Habla
Juan David Celis Nuñez ... Rodrigo Andres Llanos Castro
Ingeniería | VOL. 22
Juan David Celis Nuñez, et. al.Juan David Celis Nuñez ... Rodrigo Andres Llanos Castro
12 Sep 2017
Ingeniería | VOL. 22

Converting Neural Network Language Models into Back-off Language Models for Efficient Decoding in Automatic Speech Recognition
Ebru Arisoy ... Bhuvana Ramabhadran
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22
Ebru Arisoy, et. al.Ebru Arisoy ... Bhuvana Ramabhadran
01 Jan 2014
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22

Converting Neural Network Language Models into back-off language models for efficient decoding in automatic speech recognition
Ebru Arisoy ... Stanley F Chen
-
Ebru Arisoy, et. al.Ebru Arisoy ... Stanley F Chen
01 May 2013
01 May 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Advances in subword-based HMM-DNN speech recognition across languages

Abstract

Published Version

Talk to us

Similar Papers

More From: Computer Speech &amp; Language

More From: Computer Speech & Language