A hybrid acoustic model based on PDP coding for resolving articulation differences in low-resource speech recognition

Wenbo Zhu,Hao Jin,Jianwen Chen,Lufeng Luo,Jinhai Wang,Qinghua Lu,Aiyuan Li

doi:10.1016/j.apacoust.2021.108601

Abstract

Nowadays, low-resource automatic speech recognition (ASR) is a challenging task. The traditional low-resource automatic speech recognition methods failed to capture pronunciation variations and did not have sufficient phone frame alignment capabilities. Some studies have found that pronunciation variations are mainly reflected in the distribution of resonance peaks for vowels and compound vowels and are particularly prominent in spectrograms. Inspired by this idea, we combine it with deep learning techniques and propose a hybrid acoustic model to address the difficulty of capturing pronunciation variation in low-resource ASR. We introduce a pronunciation difference processing (PDP) block to capture resonance peak variations. And we add an improved GRU network at the back end of the model to enhance the alignment of phone frame states. At the same time, we introduce a multi-head attention to combines coarse and fine-grained features of the audio and spectrum to highlights differences in resonant peaks. Finally, we analyzed the effect of different structure parameters and coding positions for the results. Our method was evaluated on the Aidatatang and IBAN datasets. Among them, the results show that adding the PDP module respectively reduces 1.84%, 0.26%WER and 5.2%, 4.3%SER as compared to the baseline mainstream model. After adding the improved GRU, the results show that adding the PDP module respectively reduces 1.92%, 0.38%WER and 5.6%, 4.4 %SER. At the same time, after we introduced multi-head attention, the results show that adding the PDP module respectively reduces 2.33 %,0.45%WER and 6.0%, 4.8 %SER.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A hybrid acoustic model based on PDP coding for resolving articulation differences in low-resource speech recognition

Abstract

Talk to us

Similar Papers

More From: Applied Acoustics

Lead the way for us

Similar Papers

Low-Resource Speech Recognition Based on Transfer Learning
Wei-Hong Tsai ... Phuong Le Thi
-
Wei-Hong Tsai, et. al.Wei-Hong Tsai ... Phuong Le Thi
20 Dec 2022
20 Dec 2022

A Method Improves Speech Recognition with Contrastive Learning in Low-Resource Languages
Lixu Sun ... Lina Jiang
Applied Sciences | VOL. 13
Lixu Sun, et. al.Lixu Sun ... Lina Jiang
12 Apr 2023
Applied Sciences | VOL. 13

CAM: A cross-lingual adaptation framework for low-resource language speech recognition
Qing Hu ... Xilong Yu
Information Fusion | VOL. 111
Qing Hu, et. al.Qing Hu ... Xilong Yu
06 Jun 2024
Information Fusion | VOL. 111

Meta Learning with Adaptive Loss Weight for Low-Resource Speech Recognition
Qiulin Wang ... Qingyang Hong
-
Qiulin Wang, et. al.Qiulin Wang ... Qingyang Hong
04 Jun 2023
04 Jun 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A hybrid acoustic model based on PDP coding for resolving articulation differences in low-resource speech recognition

Abstract

Talk to us

Similar Papers

More From: Applied Acoustics