End-to-end Jordanian dialect speech-to-text self-supervised learning framework.

Ali A Safieh,Ibrahim Abu Alhaol,Rawan Ghnemat

doi:10.3389/frobt.2022.1090012

Ali A Safieh, Ibrahim Abu Alhaol + Show 1 more

Open Access

https://doi.org/10.3389/frobt.2022.1090012

Copy DOI

Abstract

Speech-to-text engines are extremely needed nowadays for different applications, representing an essential enabler in human-robot interaction. Still, some languages suffer from the lack of labeled speech data, especially in the Arabic dialects or any low-resource languages. The need for a self-supervised training process and self-training using noisy training is proven to be one of the up-and-coming feasible solutions. This article proposes an end-to-end, transformers-based model with a framework for low-resource languages. In addition, the framework incorporates customized audio-to-text processing algorithms to achieve a highly efficient Jordanian Arabic dialect speech-to-text system. The proposed framework enables ingesting data from many sources, making the ground truth from external sources possible by speeding up the manual annotation process. The framework allows the training process using noisy student training and self-supervised learning to utilize the unlabeled data in both pre- and post-training stages and incorporate multiple types of data augmentation. The proposed self-training approach outperforms the fine-tuned Wav2Vec model by 5% in terms of word error rate reduction. The outcome of this work provides the research community with a Jordanian-spoken data setalong with an end-to-end approach to deal with low-resource languages. This is done by utilizing the power of the pretraining, post-training, and injecting noisy labeled and augmented data with minimal human intervention. It enables the development of new applications in the field of Arabic language speech-to-text area like the question-answering systems and intelligent control systems, and it will add human-like perception and hearing sensors to intelligent robots.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in robotics and AI	Publication Date: Dec 22, 2022
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

End-to-end Jordanian dialect speech-to-text self-supervised learning framework.

Abstract

Talk to us

Similar Papers

More From: Frontiers in robotics and AI

Lead the way for us

Similar Papers

Contrastive Self-Supervised Representation Learning for Sensing Signals from the Time-Frequency Perspective
Dongxin Liu ... Shuochao Yao
-
Dongxin Liu, et. al.Dongxin Liu ... Shuochao Yao
01 Jul 2021
01 Jul 2021

Self-Supervised Learning for Time-Series Anomaly Detection in Industrial Internet of Things
Duc Hoang Tran ... Van Linh Nguyen
Electronics | VOL. 11
Duc Hoang Tran, et. al.Duc Hoang Tran ... Van Linh Nguyen
08 Jul 2022
Electronics | VOL. 11

CaSS: A Channel-Aware Self-supervised Representation Learning Framework for Multivariate Time Series Classification
Yijiang Chen ... Minyang Xu
-
Yijiang Chen, et. al.Yijiang Chen ... Minyang Xu
01 Jan 2021
01 Jan 2021

Improving self-supervised learning model for audio spoofing detection with layer-conditioned embedding fusion
Souvik Sinha ... Goutam Saha
Computer Speech & Language | VOL. 86
Souvik Sinha, et. al.Souvik Sinha ... Goutam Saha
18 Dec 2023
Computer Speech & Language | VOL. 86

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

End-to-end Jordanian dialect speech-to-text self-supervised learning framework.

Abstract

Talk to us

Similar Papers

More From: Frontiers in robotics and AI