Convolutional recurrent neural network with attention for Vietnamese speech to text problem in the operating room

Pham The Bao,Le Tran Anh Dang,Vu Ngoc Thanh Sang,Le Nhi Lam Thuy,Trinh Tan Dat

doi:10.1504/ijiids.2021.10035989

Abstract

We introduce automatic Vietnamese speech recognition (ASR) system for converting Vietnamese speech to text on a real operating room ambient noise recorded during liver surgery. First, we propose applying a combination between convolutional neural network (CNN) and bidirectional long short-term memory (BLSTM) for investigating local speech feature learning, sequence modelling, and transcription for speech recognition. We also extend the CNN-LSTM framework with an attention mechanism to decode the frames into a sequence of words. The CNN, LSTM and attention models are combining into a unified architecture. In addition, we combine connectionist temporal classification (CTC) and attention's loss functions in training phase. The length of the output label sequence from CTC is applied to the attention-based decoder predictions to make the final label sequence. This process helps to decrease irregular alignments and make speedup of the label sequence estimation during training and inference, instead of only relying on the data-driven attention-based encoder-decoder for estimating the label sequence in long sentences. The proposed system is evaluated using a real operating room database. The results show that our method significantly enhances the performance of the ASR system. We find that our approach provides a 13.05% in WER and outperforms standard methods.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Convolutional recurrent neural network with attention for Vietnamese speech to text problem in the operating room

Abstract

Talk to us

Similar Papers

More From: International Journal of Intelligent Information and Database Systems

Lead the way for us

Journal: International Journal of Intelligent Information and Database Systems	Publication Date: Jan 1, 2021
Citations: 1

Similar Papers

Collaborative Deep Neural Network for Printed Text Recognition of Indian Languages
Kapil Mehrotra ... Manish Kumar Gupta
-
Kapil Mehrotra, et. al.Kapil Mehrotra ... Manish Kumar Gupta
01 Nov 2019
01 Nov 2019

A Bidirectional LSTM approach for written script auto evaluation using keywords-based pattern matching
Prabakaran N ... Vijay Kakani
Natural Language Processing Journal | VOL. 5
Prabakaran N, et. al.Prabakaran N ... Vijay Kakani
15 Sep 2023
Natural Language Processing Journal | VOL. 5

Tunnel boring machine vibration-based deep learning for the ground identification of working faces
Mengbo Liu ... Yifeng Yang
Journal of Rock Mechanics and Geotechnical Engineering | VOL. 13
Mengbo Liu, et. al.Mengbo Liu ... Yifeng Yang
01 Dec 2021
Journal of Rock Mechanics and Geotechnical Engineering | VOL. 13

Towards cross-modal pre-training and learning tempo-spatial characteristics for audio recognition with convolutional and recurrent neural networks
Shahin Amiriparian ... Björn Schuller
Eurasip Journal on Audio, Speech, and Music Processing | VOL. 2020
Shahin Amiriparian, et. al.Shahin Amiriparian ... Björn Schuller
01 Dec 2020
Eurasip Journal on Audio, Speech, and Music Processing | VOL. 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Convolutional recurrent neural network with attention for Vietnamese speech to text problem in the operating room

Abstract

Talk to us

Similar Papers

More From: International Journal of Intelligent Information and Database Systems