Pronunciation error detection model based on feature fusion

Cuicui Zhu,Aishan Wumaier,Dongping Wei,Zhixing Fan,Jianlei Yang,Heng Yu,Zaokere Kadeer,Liejun Wang

doi:10.1016/j.specom.2023.103009

Abstract

Mispronunciation detection and diagnosis (MDD) is a specific speech recognition task that aims to recognize the phoneme sequence produced by a user, compare it with the standard phoneme sequence, and identify the type and location of any mispronunciations. However, the lack of large amounts of phoneme-level annotated data limits the performance improvement of the model. In this paper, we propose a joint training approach, Acoustic Error_Type Linguistic (AEL) that utilizes the error type information, acoustic information, and linguistic information from the annotated data, and achieves feature fusion through multiple attention mechanisms. To address the issue of uneven distribution of phonemes in the MDD data, which can cause the model to make overconfident predictions when using the CTC loss, we propose a new loss function, Focal Attention Loss, to improve the performance of the model, such as F1 score accuracy and other metrics. The proposed method in this paper was evaluated on the TIMIT and L2-Arctic public corpora. In ideal conditions, it was compared with the baseline model CNN-RNN-CTC. The F1 score, diagnostic accuracy, and precision were improved by 31.24%, 16.6%, and 17.35% respectively. Compared to the baseline model, our model reduced the phoneme error rate from 29.55% to 8.49% and showed significant improvements in other metrics. Furthermore, experimental results demonstrated that when we have a model capable of accurately obtaining pronunciation error types, our model can achieve results close to the ideal conditions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Speech Communication	Publication Date: Nov 14, 2023
Citations: 1	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Pronunciation error detection model based on feature fusion

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Similar Papers

Survey on bimodal speech emotion recognition from acoustic and linguistic information fusion
Bagus Tris Atmaja ... Masato Akagi
Speech Communication | VOL. 140
Bagus Tris Atmaja, et. al.Bagus Tris Atmaja ... Masato Akagi
26 Mar 2022
Speech Communication | VOL. 140

A Multifaceted Approach to Oral Assessment Based on the Conformer Architecture
Zhixing Fan ... Zaokere Kadeer
IEEE Access | VOL. 11
Zhixing Fan, et. al.Zhixing Fan ... Zaokere Kadeer
01 Jan 2023
IEEE Access | VOL. 11

A Deep Learning Model for Accurate Maize Disease Detection Based on State-Space Attention and Feature Fusion
Tong Zhu ... Chunli Lv
Plants | VOL. 13
Tong Zhu, et. al.Tong Zhu ... Chunli Lv
09 Nov 2024
Plants | VOL. 13

Students' Pronunciation Errors in English Silent Letters
Winda Pusfarani ... Mukhrizal Mukhrizal
Journal of English Education and Teaching | VOL. 5
Winda Pusfarani, et. al.Winda Pusfarani ... Mukhrizal Mukhrizal
27 Sep 2021
Journal of English Education and Teaching | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pronunciation error detection model based on feature fusion

Abstract

Talk to us

Similar Papers

More From: Speech Communication