Transformer Based Grapheme-to-Phoneme Conversion

Sevinj Yolchuyeva,Géza Németh,Bálint Gyires-Tóth

doi:10.21437/interspeech.2019-1954

Abstract

Attention mechanism is one of the most successful techniques in deep learning based Natural Language Processing (NLP). The transformer network architecture is completely based on attention mechanisms, and it outperforms sequence-to-sequence models in neural machine translation without recurrent and convolutional layers. Grapheme-to-phoneme (G2P) conversion is a task of converting letters (grapheme sequence) to their pronunciations (phoneme sequence). It plays a significant role in text-to-speech (TTS) and automatic speech recognition (ASR) systems. In this paper, we investigate the application of transformer architecture to G2P conversion and compare its performance with recurrent and convolutional neural network based approaches. Phoneme and word error rates are evaluated on the CMUDict dataset for US English and the NetTalk dataset. The results show that transformer based G2P outperforms the convolutional-based approach in terms of word error rate and our results significantly exceeded previous recurrent approaches (without attention) regarding word and phoneme error rates on both datasets. Furthermore, the size of the proposed model is much smaller than the size of the previous approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Transformer Based Grapheme-to-Phoneme Conversion

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A Deep Learning Automatic Speech Recognition Model for Shona Language
Leslie Wellington Sirora ... Mainford Mutandavari
International Journal of Innovative Research in Computer and Communication Engineering | VOL. 12
Leslie Wellington Sirora, et. al.Leslie Wellington Sirora ... Mainford Mutandavari
25 Sep 2024
International Journal of Innovative Research in Computer and Communication Engineering | VOL. 12

Causal analysis of Speech Recognition failure in adverse environments
Guojun Zhou ... Sangita Sharma
-
Guojun Zhou, et. al.Guojun Zhou ... Sangita Sharma
01 May 2002
01 May 2002

Grapheme-to-Phoneme Conversion with Convolutional Neural Networks
Sevinj Yolchuyeva ... Bálint Gyires-Tóth
Applied Sciences | VOL. 9
Sevinj Yolchuyeva, et. al.Sevinj Yolchuyeva ... Bálint Gyires-Tóth
18 Mar 2019
Applied Sciences | VOL. 9

Stemmer and phonotactic rules to improve n-gram tagger-based indonesian phonemicization
Suyanto Suyanto ... Warih Maharani
Journal of King Saud University - Computer and Information Sciences | VOL. 34
Suyanto Suyanto, et. al.Suyanto Suyanto ... Warih Maharani
14 Jan 2021
Journal of King Saud University - Computer and Information Sciences | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Transformer Based Grapheme-to-Phoneme Conversion

Abstract

Talk to us

Similar Papers