Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks

Kanishka Rao,Francoise Beaufays,Fuchun Peng,Hasim Sak

doi:10.1109/icassp.2015.7178767

Abstract

Grapheme-to-phoneme (G2P) models are key components in speech recognition and text-to-speech systems as they describe how words are pronounced. We propose a G2P model based on a Long Short-Term Memory (LSTM) recurrent neural network (RNN). In contrast to traditional joint-sequence based G2P approaches, LSTMs have the flexibility of taking into consideration the full context of graphemes and transform the problem from a series of grapheme-to-phoneme conversions to a word-to-pronunciation conversion. Training joint-sequence based G2P require explicit grapheme-to-phoneme alignments which are not straightforward since graphemes and phonemes don't correspond one-to-one. The LSTM based approach forgoes the need for such explicit alignments. We experiment with unidirectional LSTM (ULSTM) with different kinds of output delays and deep bidirectional LSTM (DBLSTM) with a connectionist temporal classification (CTC) layer. The DBLSTM-CTC model achieves a word error rate (WER) of 25.8% on the public CMU dataset for US English. Combining the DBLSTM-CTC model with a joint n-gram model results in a WER of 21.3%, which is a 9% relative improvement compared to the previous best WER of 23.4% from a hybrid system.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Maxout neurons based deep bidirectional LSTM for acoustic modeling
Yuan Luo ... Yu Liu
-
Yuan Luo, et. al.Yuan Luo ... Yu Liu
01 Dec 2017
01 Dec 2017

Bidirectional Quaternion Long Short-term Memory Recurrent Neural Networks for Speech Recognition
Titouan Parcollet ... Georges Linares
-
Titouan Parcollet, et. al.Titouan Parcollet ... Georges Linares
01 May 2019
01 May 2019

Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks
Zhuo Chen ... John R Hershey
-
Zhuo Chen, et. al.Zhuo Chen ... John R Hershey
06 Sep 2015
06 Sep 2015

ECG Beat Classification Based on Deep Bidirectional Long Short-Term Memory Recurrent Neural Network
Runchuan Li ... Gang Chen
-
Runchuan Li, et. al.Runchuan Li ... Gang Chen
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks

Abstract

Talk to us

Similar Papers