A hybrid input-type recurrent neural network for LVCSR language modeling

Vataya Chunwijitra,Chai Wutiwiwatchai,Ananlada Chotimongkol

doi:10.1186/s13636-016-0093-x

Abstract

Substantial amounts of resources are usually required to robustly develop a language model for an open vocabulary speech recognition system as out-of-vocabulary (OOV) words can hurt recognition accuracy. In this work, we applied a hybrid lexicon of word and sub-word units to resolve the problem of OOV words in a resource-efficient way. As sub-lexical units can be combined to form new words, a compact set of hybrid vocabulary can be used while still maintaining a low OOV rate. For Thai, a syllable-based unit called pseudo-morpheme (PM) was chosen as a sub-word unit. To also benefit from different levels of linguistic information embedded in different input types, a hybrid recurrent neural network language model (RNNLM) framework is proposed. An RNNLM can model not only information from multiple-type input units through a hybrid input vector of words and PMs, but can also capture long context history through recurrent connections. Several hybrid input representations were also explored to optimize both recognition accuracy and computational time. The hybrid LM has shown to be both resource-efficient and well-performed on two Thai LVCSR tasks: broadcast news transcription and speech-to-speech translation. The proposed hybrid lexicon can constitute an open vocabulary for Thai LVCSR as it can greatly reduce the OOV rate to less than 1 % while using only 42 % of the vocabulary size of the word-based lexicon. In terms of recognition performance, the best proposed hybrid RNNLM, which uses a mixed word-PM input, obtained 1.54 % relative WER reduction when compared with a conventional word-based RNNLM. In terms of computational time, the best hybrid RNNLM has the lowest training and decoding time among all RNNLMs including the word-based RNNLM. The overall relative reduction on WER of the proposed hybrid RNNLM over a traditional n-gram model is 6.91 %.

Highlights

The vocabulary of any active language continues to grow as new words, such as person names, place names, and new technical terms, are introduced everyday
5 Experiments We evaluated the performance of the proposed hybrid recurrent neural network language model (RNNLM) on both recognition accuracy and computational efficiency
The proposed hybrid lexicon can constitute an open vocabulary for Thai large vocabulary continuous speech recognition (LVCSR) as it can greatly reduce the OOV rate to less than 1 % while using only 42 % of the vocabulary size of the word-based lexicon

Summary

Introduction

The vocabulary of any active language continues to grow as new words, such as person names, place names, and new technical terms, are introduced everyday. In the first-pass decoding, a hybrid n-gram LM similar to [4] is utilized to create a hybrid n-best list where OOV words could be recognized as a sequence of PMs. A hybrid RNNLM, which can consider information from different types of input units together, is applied in the second-pass to re-score the hybrid n-best list for better recognition accuracy. This curve is plotted from 5 million words randomly selected from three text and speech corpora: BEST [10], LOTUS-BN [11], and HITBTEC [12].

Thai lexical

Hybrid recurrent neural network language model

Recognition performance of the second-pass re-scoring

K 10 K 15 K 5 K 10 K 15 K

21 K 0 K 25 K 25 K 5 K 10 K

Findings

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Eurasip Journal on Audio, Speech, and Music Processing	Publication Date: Aug 8, 2016
Citations: 15	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A hybrid input-type recurrent neural network for LVCSR language modeling

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Eurasip Journal on Audio, Speech, and Music Processing

Lead the way for us

Similar Papers

Joint unsupervised adaptation of n-gram and RNN language models via LDA-based hybrid mixture modeling
Ryo Masumura ... Yushi Aono
-
Ryo Masumura, et. al.Ryo Masumura ... Yushi Aono
01 Dec 2017
01 Dec 2017

A Comparison of RNN LM and FLM for Russian Speech Recognition
Irina Kipyatkova ... Alexey Karpov
-
Irina Kipyatkova, et. al.Irina Kipyatkova ... Alexey Karpov
01 Jan 2015
01 Jan 2015

Training RNN language models on uncertain ASR hypotheses in limited data scenarios
Imran Sheikh ... Irina Illina
Computer Speech & Language | VOL. 83
Imran Sheikh, et. al.Imran Sheikh ... Irina Illina
20 Aug 2023
Computer Speech & Language | VOL. 83

Enhanced Word Classing for Recurrent Neural Network Language Model
Yujing Si
Journal of Information and Computational Science | VOL. 10
Yujing SiYujing Si
10 Aug 2013
Journal of Information and Computational Science | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A hybrid input-type recurrent neural network for LVCSR language modeling

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Eurasip Journal on Audio, Speech, and Music Processing