ATCSpeechNet: A multilingual end-to-end speech recognition framework for air traffic control systems

Yi Lin,Bo Yang,Linchao Li,Dongyue Guo,Jianwei Zhang,Hu Chen,Yi Zhang

doi:10.1016/j.asoc.2021.107847

Abstract

In this paper, a multilingual end-to-end framework, called ATCSpeechNet, is proposed to tackle the issue of translating communication speech into human-readable text in air traffic control (ATC) systems. In the proposed framework, we focus on integrating multilingual automatic speech recognition (ASR) into one model, in which an end-to-end paradigm is developed to convert speech waveforms into text directly, without any feature engineering or lexicon. To compensate the deficiency of handcrafted feature engineering caused by ATC challenges, including multilingual, multispeaker dialog and unstable speech rates, a speech representation learning (SRL) network is proposed to capture robust and discriminative speech representations from raw waves. The self-supervised training strategy is adopted to optimize the SRL network from unlabeled data, and to further predict the speech features, i.e., wave-to-feature. An end-to-end architecture is improved to complete the ASR task, in which a grapheme-based modeling unit is applied to address the multilingual ASR issue. Facing the problem of small transcribed samples in the ATC domain, an unsupervised approach with mask prediction is applied to pretrain the backbone network of the ASR model on unlabeled data by a feature-to-feature process. Finally, by integrating the SRL with ASR, an end-to-end multilingual ASR framework is formulated in a supervised manner, which is able to translate the raw wave into text in one model, i.e., wave-to-text. Experimental results on the ATCSpeech corpus demonstrate that the proposed approach achieves high performance with a very small labeled corpus and less resource consumption, only a 4.20% label error rate on the 58-hour transcribed corpus. Compared to the baseline model, the proposed approach obtains over 100% relative performance improvement which can be further enhanced with increasing size of the transcribed samples. It is also confirmed that the proposed SRL and training strategies make significant contributions to improving the final performance. In addition, the effectiveness of the proposed framework is also validated on common corpora (AISHELL, LibriSpeech, and cv-fr). More importantly, the proposed multilingual framework not only reduces the system complexity but also obtains higher accuracy compared to that of the independent monolingual ASR models. The proposed approach can also greatly reduce the cost of annotating samples, which benefits to advance the ASR technique to industrial applications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ATCSpeechNet: A multilingual end-to-end speech recognition framework for air traffic control systems

Abstract

Talk to us

Similar Papers

More From: Applied Soft Computing

Lead the way for us

Journal: Applied Soft Computing	Publication Date: Aug 26, 2021
Citations: 22

Similar Papers

Improving speech recognition models with small samples for air traffic control systems
Yi Lin ... Zhengmao Chen
Neurocomputing | VOL. 445
Yi Lin, et. al.Yi Lin ... Zhengmao Chen
16 Feb 2021
Neurocomputing | VOL. 445

Applying automatic speech recognition technology to air traffic management
Hunter Kopald ... Shuo Chen
-
Hunter Kopald, et. al.Hunter Kopald ... Shuo Chen
01 Oct 2013
01 Oct 2013

Speech Recognition for Air Traffic Control via Feature Learning and End-to-End Training
Peng Fan ... Wenyi Ge
IEICE Transactions on Information and Systems | VOL. E106.D
Peng Fan, et. al.Peng Fan ... Wenyi Ge
01 Apr 2023
IEICE Transactions on Information and Systems | VOL. E106.D

Applying automatic speech recognition technology to Air Traffic Management
Hunter D Kopald ... Elida C Smith
-
Hunter D Kopald, et. al.Hunter D Kopald ... Elida C Smith
01 Oct 2013
01 Oct 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ATCSpeechNet: A multilingual end-to-end speech recognition framework for air traffic control systems

Abstract

Talk to us

Similar Papers

More From: Applied Soft Computing