Cross-Language End-to-End Speech Recognition Research Based on Transfer Learning for the Low-Resource Tujia Language

Chongchong Yu,Meng Kang,Xueer Liu,Shixuan Xu,Yueqiao Li,Yunbing Chen

doi:10.3390/sym11020179

Abstract

To rescue and preserve an endangered language, this paper studied an end-to-end speech recognition model based on sample transfer learning for the low-resource Tujia language. From the perspective of the Tujia language international phonetic alphabet (IPA) label layer, using Chinese corpus as an extension of the Tujia language can effectively solve the problem of an insufficient corpus in the Tujia language, constructing a cross-language corpus and an IPA dictionary that is unified between the Chinese and Tujia languages. The convolutional neural network (CNN) and bi-directional long short-term memory (BiLSTM) network were used to extract the cross-language acoustic features and train shared hidden layer weights for the Tujia language and Chinese phonetic corpus. In addition, the automatic speech recognition function of the Tujia language was realized using the end-to-end method that consists of symmetric encoding and decoding. Furthermore, transfer learning was used to establish the model of the cross-language end-to-end Tujia language recognition system. The experimental results showed that the recognition error rate of the proposed model is 46.19%, which is 2.11% lower than the that of the model that only used the Tujia language data for training. Therefore, this approach is feasible and effective.

Highlights

Endangered languages are non-renewable intangible cultural resources
In the past two years, the end-to-end model based on deep learning, such as using a convolutional neural network (CNN) or CLDNN to implement an end-to-end model in the connectionist temporal classification (CTC) framework or the recently proposed low frame rate and chain model, which are based on coarse-grained modelling unit technology [19,20], has enabled progress to be made in recognition performance and has become a research direction
We propose to use cross-lingual speech recognition and transfer learning to 5establish aSymmetry

Summary

Introduction

Endangered languages are non-renewable intangible cultural resources. The core task of salvaging and preserving endangered languages is the mechanism of recording speech, processing corpus, and preserving language information. In response to this problem, recent research internationally in the field of speech recognition based on deep learning has partly focused on end-to-end speech recognition technology [3,4,5,6,7] This method directly models between the phoneme sequence or context-dependent phone (CD-phone) sequence and the corresponding phonetic feature sequence that does not need constraint alignment to obtain frame-level annotation with HMM. The preservation of an endangered language corpus requires text processing, such as labelling and translation, for the recording of natural language discourses At present, it has become a bottleneck in the protection of the Tujia language.

Review of Related Work

Feature Extraction Based on CNN

End-to-End Speech Recognition Based on LSTM-CTC

Proposed Method

Tujia Language Corpus

Extended Speech Corpus

End-to-EndFigure

Experimental Environment

Parameters of the Models

Experimental Results

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Symmetry	Publication Date: Feb 2, 2019
Citations: 14	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Cross-Language End-to-End Speech Recognition Research Based on Transfer Learning for the Low-Resource Tujia Language

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry

Lead the way for us

Similar Papers

Detecting cyberbullying text using the approaches with machine learning models for the low-resource Bengali language
Md Nesarul Hoque ... Md Hanif Seddiqui
IAES International Journal of Artificial Intelligence (IJ-AI) | VOL. 13
Md Nesarul Hoque, et. al.Md Nesarul Hoque ... Md Hanif Seddiqui
01 Mar 2024
IAES International Journal of Artificial Intelligence (IJ-AI) | VOL. 13

Urban Road Traffic Flow Prediction with Attention-Based Convolutional Bidirectional Long Short-Term Memory Networks
Zhiquan Liu ... Xiangying Ding
Transportation Research Record: Journal of the Transportation Research Board | VOL. 2677
Zhiquan Liu, et. al.Zhiquan Liu ... Xiangying Ding
16 Feb 2023
Transportation Research Record: Journal of the Transportation Research Board | VOL. 2677

Prediction of PM2.5 concentration in urban agglomeration of China by hybrid network model
Shuaiwen Wu ... Hengkai Li
Journal of cleaner production | VOL. 374
Shuaiwen Wu, et. al.Shuaiwen Wu ... Hengkai Li
12 Sep 2022
Journal of cleaner production | VOL. 374

Gujarati Task Oriented Dialogue Slot Tagging Using Deep Neural Network Models
Rachana Parikh ... Hiren Joshi
-
Rachana Parikh, et. al.Rachana Parikh ... Hiren Joshi
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cross-Language End-to-End Speech Recognition Research Based on Transfer Learning for the Low-Resource Tujia Language

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry