Identifying the influence of transfer learning method in developing an end-to-end automatic speech recognition system with a low data level

Orken Mamyrbayev,Bagashar Zhumazhanov,Akbayan Bekarystankyzy,Keylan Alimhan,Dina Oralbekova

doi:10.15587/1729-4061.2022.252801

Orken Mamyrbayev, Bagashar Zhumazhanov + Show 3 more

Open Access

PDF Available

https://doi.org/10.15587/1729-4061.2022.252801

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Ensuring the best quality and performance of modern speech technologies, today, is possible based on the widespread use of machine learning methods. The idea of this project is to study and implement an end-to-end system of automatic speech recognition using machine learning methods, as well as to develop new mathematical models and algorithms for solving the problem of automatic speech recognition for agglutinative (Turkic) languages. Many research papers have shown that deep learning methods make it easier to train automatic speech recognition systems that use an end-to-end approach. This method can also train an automatic speech recognition system directly, that is, without manual work with raw signals. Despite the good recognition quality, this model has some drawbacks. These disadvantages are based on the need for a large amount of data for training. This is a serious problem for low-data languages, especially Turkic languages such as Kazakh and Azerbaijani. To solve this problem, various methods are needed to apply. Some methods are used for end-to-end speech recognition of languages belonging to the group of languages of the same family (agglutinative languages). Method for low-resource languages is transfer learning, and for large resources – multi-task learning. To increase efficiency and quickly solve the problem associated with a limited resource, transfer learning was used for the end-to-end model. The transfer learning method helped to fit a model trained on the Kazakh dataset to the Azerbaijani dataset. Thereby, two language corpora were trained simultaneously. Conducted experiments with two corpora show that transfer learning can reduce the symbol error rate, phoneme error rate (PER), by 14.23 % compared to baseline models (DNN+HMM, WaveNet, and CNC+LM). Therefore, the realized model with the transfer method can be used to recognize other low-resource languages.

Highlights

Automatic speech recognition systems began to develop dynamically with the rapid development of computing technologies
A speech corpus was assembled for the Kazakh language in the amount of 400 hours of speech, and for the Azerbaijani language, a speech corpus was formed amounting to 70 hours of speech
The joint connectionist temporal classification (CTC) attention models are trained based on the extracted features through the negative matrix factorization (NMF) algorithm

Summary

Introduction

Automatic speech recognition systems began to develop dynamically with the rapid development of computing technologies. Accurate results have been achieved in the field of speech recognition, with many models and methods used in commercial applications, justifying their use in these directions. Among the commercial applications for speech recognition, first is the introduction of call centers or interactive voice response (IVR) systems for automatic access to information, speech chatbots, etc. Call centers have implemented intelligent voice assistants that generate user questions in natural language, and the response is synthesized by the system in the user’s language. Primary automatic speech recognition systems consist of three modules: decoding, acoustic, and language models. The modular subsystem for speech recognition mainly consists of independent modules, and even the acoustic model depends on the HMM model as well as GMM models, which, in many cases, correspond to the pronunciation unit [1]

Objectives

Methods

Findings

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Eastern-European Journal of Enterprise Technologies	Publication Date: Feb 28, 2022
Citations: 11	License type: CC BY 4.0

R Discovery Prime

Identifying the influence of transfer learning method in developing an end-to-end automatic speech recognition system with a low data level

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Eastern-European Journal of Enterprise Technologies

Lead the way for us

Similar Papers

Analytical Review of Methods for Solving Data Scarcity Issues Regarding Elaboration of Automatic Speech Recognition Systems for Low-Resource Languages
Irina Kipyatkova ... Ildar Kagirov
Информатика и автоматизация | VOL. 21
Irina Kipyatkova, et. al.Irina Kipyatkova ... Ildar Kagirov
08 Jul 2022
Информатика и автоматизация | VOL. 21

Transfer Learning Approaches for Streaming End-to-End Speech Recognition System
Vikas Joshi ... Jinyu Li
-
Vikas Joshi, et. al.Vikas Joshi ... Jinyu Li
25 Oct 2020
25 Oct 2020

A Deep Learning Automatic Speech Recognition Model for Shona Language
Leslie Wellington Sirora ... Mainford Mutandavari
International Journal of Innovative Research in Computer and Communication Engineering | VOL. 12
Leslie Wellington Sirora, et. al.Leslie Wellington Sirora ... Mainford Mutandavari
25 Sep 2024
International Journal of Innovative Research in Computer and Communication Engineering | VOL. 12

An Investigation of Multilingual TDNN-BLSTM Acoustic Modeling for Hindi Speech Recognition
Ankit Kumar ... Rajesh Kumar Aggarwal
International Journal of Sensors, Wireless Communications and Control | VOL. 12
Ankit Kumar, et. al.Ankit Kumar ... Rajesh Kumar Aggarwal
01 Jan 2021
International Journal of Sensors, Wireless Communications and Control | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Identifying the influence of transfer learning method in developing an end-to-end automatic speech recognition system with a low data level

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Eastern-European Journal of Enterprise Technologies