Improving Amharic Speech Recognition System Using Connectionist Temporal Classification with Attention Model and Phoneme-Based Byte-Pair-Encodings

Eshete Derb Emiru,Yaxing Li,Moussa Diallo,Shengwu Xiong,Awet Fesseha

doi:10.3390/info12020062

Eshete Derb Emiru, Yaxing Li + Show 3 more

Open Access

PDF Available

https://doi.org/10.3390/info12020062

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Out-of-vocabulary (OOV) words are the most challenging problem in automatic speech recognition (ASR), especially for morphologically rich languages. Most end-to-end speech recognition systems are performed at word and character levels of a language. Amharic is a poorly resourced but morphologically rich language. This paper proposes hybrid connectionist temporal classification with attention end-to-end architecture and a syllabification algorithm for Amharic automatic speech recognition system (AASR) using its phoneme-based subword units. This algorithm helps to insert the epithetic vowel እ[ɨ], which is not included in our Grapheme-to-Phoneme (G2P) conversion algorithm developed using consonant–vowel (CV) representations of Amharic graphemes. The proposed end-to-end model was trained in various Amharic subwords, namely characters, phonemes, character-based subwords, and phoneme-based subwords generated by the byte-pair-encoding (BPE) segmentation algorithm. Experimental results showed that context-dependent phoneme-based subwords tend to result in more accurate speech recognition systems than the character-based, phoneme-based, and character-based subword counterparts. Further improvement was also obtained in proposed phoneme-based subwords with the syllabification algorithm and SpecAugment data augmentation technique. The word error rate (WER) reduction was 18.38% compared to character-based acoustic modeling with the word-based recurrent neural network language modeling (RNNLM) baseline. These phoneme-based subword models are also useful to improve machine and speech translation tasks.

Highlights

The use of conventional hidden Markov models (HMMs) and deep neural networks (DNNs) of automatic speech recognition (ASR) systems in the preparation of a lexicon, acoustic models, and language models results in complications [1]
We trained our end‐to‐end connectionist temporal classification (CTC)‐attention ASR system using various vocabulary sizes. These vocabulary sizes were selected based on frequently occurring Amharic words
Ularies, which were based on the most frequently occurring words. These results showed that the performance of the phoneme‐based CTC‐attention method was significantly better than that of the character‐based method because the latter is supported by pronunciation‐

Summary

Introduction

The use of conventional hidden Markov models (HMMs) and deep neural networks (DNNs) of automatic speech recognition (ASR) systems in the preparation of a lexicon, acoustic models, and language models results in complications [1]. These approaches require linguistic resources, such as a pronunciation dictionary, tokenization, and phonetic context dependencies [2]. End‐to‐end ASR has grown to be a popular alterna‐. Tive to simplify the conventional ASR model building process. The end‐to‐end ASR system directly transcribes an input sequence of acous‐

Methods

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information	Publication Date: Feb 3, 2021
Citations: 12	License type: CC BY 4.0

R Discovery Prime

Improving Amharic Speech Recognition System Using Connectionist Temporal Classification with Attention Model and Phoneme-Based Byte-Pair-Encodings

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

Theoretical Analysis of Diversity in an Ensemble of Automatic Speech Recognition Systems
Kartik Audhkhasi ... Shrikanth S Narayanan
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22
Kartik Audhkhasi, et. al.Kartik Audhkhasi ... Shrikanth S Narayanan
01 Mar 2014
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22

Training RNN language models on uncertain ASR hypotheses in limited data scenarios
Imran Sheikh ... Irina Illina
Computer Speech & Language | VOL. 83
Imran Sheikh, et. al.Imran Sheikh ... Irina Illina
20 Aug 2023
Computer Speech & Language | VOL. 83

Dropout Approaches for LSTM Based Speech Recognition Systems
Jayadev Billa
-
Jayadev BillaJayadev Billa
01 Apr 2018
01 Apr 2018

Recurrent neural network language model in mandarin voice input system
Yujing Si ... Shang Cai
-
Yujing Si, et. al.Yujing Si ... Shang Cai
01 May 2012
01 May 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Improving Amharic Speech Recognition System Using Connectionist Temporal Classification with Attention Model and Phoneme-Based Byte-Pair-Encodings

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Information