Grapheme-to-Phoneme Conversion with Convolutional Neural Networks

Sevinj Yolchuyeva,Géza Németh,Bálint Gyires-Tóth

doi:10.3390/app9061143

Sevinj Yolchuyeva, Géza Németh + Show 1 more

Open Access

https://doi.org/10.3390/app9061143

Copy DOI

Abstract

Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form. It has a highly essential role for natural language processing, text-to-speech synthesis and automatic speech recognition systems. In this paper, we investigate convolutional neural networks (CNN) for G2P conversion. We propose a novel CNN-based sequence-to-sequence (seq2seq) architecture for G2P conversion. Our approach includes an end-to-end CNN G2P conversion with residual connections and, furthermore, a model that utilizes a convolutional neural network (with and without residual connections) as encoder and Bi-LSTM as a decoder. We compare our approach with state-of-the-art methods, including Encoder-Decoder LSTM and Encoder-Decoder Bi-LSTM. Training and inference times, phoneme and word error rates were evaluated on the public CMUDict dataset for US English, and the best performing convolutional neural network-based architecture was also evaluated on the NetTalk dataset. Our method approaches the accuracy of previous state-of-the-art results in terms of phoneme error rate.

Highlights

The process of grapheme-to-phoneme (G2P) conversion generates a phonetic transcription from the written form of words
Phoneme and word error rates were evaluated on the public CMUDict dataset for US English, and the best performing convolutional neural network-based architecture was evaluated on the NetTalk dataset
The best model with hyperparameter optimization achieved a 5.37% phoneme error rate (PER) and a 23.23% word error rate (WER)

Summary

Introduction

The process of grapheme-to-phoneme (G2P) conversion generates a phonetic transcription from the written form of words. The spelling of a word is called a grapheme sequence (or graphemes), the phonetic form is called a phoneme sequence (or phonemes). It is essential to develop a phonemic lexicon in text-to-speech (TTS) and automatic speech recognition (ASR) systems. For this purpose, G2P techniques are used, and getting state-of-the-art performance in these systems depends on the accuracy of G2P conversion. In ASR acoustic models, the pronunciation lexicons and language models are critical components. Acoustic and language models are built automatically from large corpora. Pronunciation lexicons are the middle layer between acoustic and language models

Objectives

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Mar 18, 2019
Citations: 24	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Grapheme-to-Phoneme Conversion with Convolutional Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Transformer Based Grapheme-to-Phoneme Conversion
Sevinj Yolchuyeva ... Bálint Gyires-Tóth
-
Sevinj Yolchuyeva, et. al.Sevinj Yolchuyeva ... Bálint Gyires-Tóth
15 Sep 2019
15 Sep 2019

A Deep Learning Automatic Speech Recognition Model for Shona Language
Leslie Wellington Sirora ... Mainford Mutandavari
International Journal of Innovative Research in Computer and Communication Engineering | VOL. 12
Leslie Wellington Sirora, et. al.Leslie Wellington Sirora ... Mainford Mutandavari
25 Sep 2024
International Journal of Innovative Research in Computer and Communication Engineering | VOL. 12

Theoretical Analysis of Diversity in an Ensemble of Automatic Speech Recognition Systems
Kartik Audhkhasi ... Panayiotis G Georgiou
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22
Kartik Audhkhasi, et. al.Kartik Audhkhasi ... Panayiotis G Georgiou
01 Mar 2014
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22

Applying Linguistic G2P Knowledge on a Statistical Grapheme-to-phoneme Conversion in Khmer
Vathnak Sar ... Tien-Ping Tan
Procedia Computer Science | VOL. 161
Vathnak Sar, et. al.Vathnak Sar ... Tien-Ping Tan
01 Jan 2019
Procedia Computer Science | VOL. 161

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Grapheme-to-Phoneme Conversion with Convolutional Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences