Japanese Pronunciation Evaluation Based on DDNN

Deguo Mu,Wei Li,Wei Sun,Guoliang Xu

doi:10.1109/access.2020.3041901

Abstract

In recent years, speech recognition technology based on deep learning model has made great progress, and the accuracy of speech recognition has reached more than 90%. In foreign language learning, speech evaluation is an important application. Billions of foreign language learners need to practice effective pronunciation. However, due to the different goals between speech recognition and speech evaluation, a single speech recognition model cannot be directly applied to pronunciation evaluation. This paper proposes a DDNN (double-layer deep neural network) model, which includes the speech text alignment model and speech recognition model. In the first layer of the speech alignment model, a new Viterbi algorithm method is proposed to find the best path for the alignment of speech and text. In the second layer of speech evaluation and scoring, we are the first to use the CNN (Convolutional Neural Network) and RNN (Recurrent Neural Network) on the encoding part of Attention. The accuracy of CTC model reaches 76.7%, and that of attention model is 81.2%. The experimental results show that the speech and text alignment method is effective, and the speech evaluation results based on the Attention model are better. The FRR (false rejection rate), FAR (false acceptance rate), and DER (diagnostic rate) in the Attention model were 4.5%, 5.1%, and 17.9%, respectively. At the same time, the evaluation of each sentence of the DDNN model in the online experiment is within 1 second, so the model can also be applied to the online real-time evaluation of speech pronunciation.

Highlights

With the advent of globalization, the number of people are learning foreign languages are increasing
More and more researchers begin involved in the study of CALL (Computer-Aided Language Learning), a research field of speech recognition
With reference to [18], the whole detection and diagnosis of phonetic errors are classified into three parts; research based on pronunciation scoring, speech recognition network based on forced alignment, and study on acoustic characterization and modeling

Summary

INTRODUCTION

With the advent of globalization, the number of people are learning foreign languages are increasing. With reference to [18], the whole detection and diagnosis of phonetic errors are classified into three parts; research based on pronunciation scoring, speech recognition network based on forced alignment, and study on acoustic characterization and modeling. The two-layer deep learning neural network model based on CTC and Attention is proposed to detect Japanese pronunciation errors, and the state-to-art effect is achieved; 2. The word-level phoneme recognition combining CNN with LSTM and Attention is proposed to detect pronunciation errors, and compared with the detection results of CTC based on LSTM, the former is better; 4. According to these pronunciation characteristics of the language, we have achieved phoneme-level alignment in the first model, we still output in word units in order to avoid the inconsistent effect of phonemes and reduce the accuracy of forced alignment Another key advantage is that speech in words does not cause the loss of information about the phoneme context. In a task with high training accuracy, as its name suggests, CTC is designed for temporal classification tasks in [44]; that is for sequence labeling problems where the alignment between the inputs and the target labels is unknown

CTC algorithm

Result

Findings

VIII. Conclusion and future works

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 42	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Japanese Pronunciation Evaluation Based on DDNN

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Research on the application of machine learning models in speech recognition in noisy environments
Yujie Tian
Applied and Computational Engineering | VOL. 19
Yujie TianYujie Tian
23 Oct 2023
Applied and Computational Engineering | VOL. 19

Learning speech recognition from songbirds
Izzet B Yildiz ... Katharina Von Kriegstein
BMC Neuroscience | VOL. 14
Izzet B Yildiz, et. al.Izzet B Yildiz ... Katharina Von Kriegstein
01 Jul 2013
BMC Neuroscience | VOL. 14

Exploring AI-based Speaker Dependent Methods in Dysarthric Speech Recognition
Davide Mulfari ... Antonio Celesti
-
Davide Mulfari, et. al.Davide Mulfari ... Antonio Celesti
01 May 2022
01 May 2022

Taylor-DBN: A new framework for speech recognition systems
Arul Valiyavalappil Haridas ... Ramalatha Marimuthu
International Journal of Wavelets, Multiresolution and Information Processing | VOL. 19
Arul Valiyavalappil Haridas, et. al.Arul Valiyavalappil Haridas ... Ramalatha Marimuthu
11 Dec 2020
International Journal of Wavelets, Multiresolution and Information Processing | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Japanese Pronunciation Evaluation Based on DDNN

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access