Abstract

To rescue and preserve an endangered language, this paper studied an end-to-end speech recognition model based on sample transfer learning for the low-resource Tujia language. From the perspective of the Tujia language international phonetic alphabet (IPA) label layer, using Chinese corpus as an extension of the Tujia language can effectively solve the problem of an insufficient corpus in the Tujia language, constructing a cross-language corpus and an IPA dictionary that is unified between the Chinese and Tujia languages. The convolutional neural network (CNN) and bi-directional long short-term memory (BiLSTM) network were used to extract the cross-language acoustic features and train shared hidden layer weights for the Tujia language and Chinese phonetic corpus. In addition, the automatic speech recognition function of the Tujia language was realized using the end-to-end method that consists of symmetric encoding and decoding. Furthermore, transfer learning was used to establish the model of the cross-language end-to-end Tujia language recognition system. The experimental results showed that the recognition error rate of the proposed model is 46.19%, which is 2.11% lower than the that of the model that only used the Tujia language data for training. Therefore, this approach is feasible and effective.

Highlights

  • Endangered languages are non-renewable intangible cultural resources

  • In the past two years, the end-to-end model based on deep learning, such as using a convolutional neural network (CNN) or CLDNN to implement an end-to-end model in the connectionist temporal classification (CTC) framework or the recently proposed low frame rate and chain model, which are based on coarse-grained modelling unit technology [19,20], has enabled progress to be made in recognition performance and has become a research direction

  • We propose to use cross-lingual speech recognition and transfer learning to 5establish aSymmetry

Read more

Summary

Introduction

Endangered languages are non-renewable intangible cultural resources. The core task of salvaging and preserving endangered languages is the mechanism of recording speech, processing corpus, and preserving language information. In response to this problem, recent research internationally in the field of speech recognition based on deep learning has partly focused on end-to-end speech recognition technology [3,4,5,6,7] This method directly models between the phoneme sequence or context-dependent phone (CD-phone) sequence and the corresponding phonetic feature sequence that does not need constraint alignment to obtain frame-level annotation with HMM. The preservation of an endangered language corpus requires text processing, such as labelling and translation, for the recording of natural language discourses At present, it has become a bottleneck in the protection of the Tujia language.

Review of Related Work
Feature Extraction Based on CNN
End-to-End Speech Recognition Based on LSTM-CTC
Proposed Method
Tujia Language Corpus
Extended Speech Corpus
End-to-EndFigure
Experimental Environment
Parameters of the Models
Experimental Results
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call