Online Speech Recognition Using Multichannel Parallel Acoustic Score Computation and Deep Neural Network (DNN)- Based Voice-Activity Detector

Yoo Rhee Oh,Jeon Gyu Park,Kiyoung Park

doi:10.3390/app10124091

Yoo Rhee Oh, Jeon Gyu Park + Show 1 more

Open Access

https://doi.org/10.3390/app10124091

Copy DOI

Abstract

This paper aims to design an online, low-latency, and high-performance speech recognition system using a bidirectional long short-term memory (BLSTM) acoustic model. To achieve this, we adopt a server-client model and a context-sensitive-chunk-based approach. The speech recognition server manages a main thread and a decoder thread for each client and one worker thread. The main thread communicates with the connected client, extracts speech features, and buffers the features. The decoder thread performs speech recognition, including the proposed multichannel parallel acoustic score computation of a BLSTM acoustic model, the proposed deep neural network-based voice activity detector, and Viterbi decoding. The proposed acoustic score computation method estimates the acoustic scores of a context-sensitive-chunk BLSTM acoustic model for the batched speech features from concurrent clients, using the worker thread. The proposed deep neural network-based voice activity detector detects short pauses in the utterance to reduce response latency, while the user utters long sentences. From the experiments of Korean speech recognition, the number of concurrent clients is increased from 22 to 44 using the proposed acoustic score computation. When combined with the frame skipping method, the number is further increased up to 59 clients with a small accuracy degradation. Moreover, the average user-perceived latency is reduced from 11.71 s to 3.09–5.41 s by using the proposed deep neural network-based voice activity detector.

Highlights

The main thread communicates with the connected client, manages the decoder thread, extracts the speech feature vectors from the received audio segments, and buffers the feature vectors into a ring buffer
This paper presents an online multichannel automatic speech recognition (ASR) system employing a bidirectional long short-term memory (BLSTM) acoustic model (AM), which is hardly deployed in industries even though it is one of the best performing AMs
We presented a server-client-based online ASR system employing a BLSTM AM, which is a state-of-the-art AM

Summary

Introduction

Deep learning with GPU and considerable speech data has greatly accelerated the advance of speech recognition [1,2,3,4,5]. In line with this advancement, automatic speech recognition (ASR). The research on an ASR deployment can be classified into (a) an on-device system and (b). There exists a trade-off between ASR accuracy and real-time performance. The trade-off between accuracy and real-time performance can

Objectives

Methods

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Jun 14, 2020
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Online Speech Recognition Using Multichannel Parallel Acoustic Score Computation and Deep Neural Network (DNN)- Based Voice-Activity Detector

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Bidirectional LSTM with Extended Input Context
Gaofeng Cheng ... Jiasong Sun
-
Gaofeng Cheng, et. al.Gaofeng Cheng ... Jiasong Sun
01 Nov 2018
01 Nov 2018

Fundamental frequency feature warping for frequency normalization and data augmentation in child automatic speech recognition
Gary Yeung ... Abeer Alwan
Speech Communication | VOL. 135
Gary Yeung, et. al.Gary Yeung ... Abeer Alwan
14 Sep 2021
Speech Communication | VOL. 135

Post Text Processing of Chinese Speech Recognition Based on Bidirectional LSTM Networks and CRF
Li Yang ... Ying Li
Electronics | VOL. 8
Li Yang, et. al.Li Yang ... Ying Li
31 Oct 2019
Electronics | VOL. 8

Vowel speech recognition from rat electroencephalography using long short-term memory neural network.
Jinsil Ham ... Sidarta Ribeiro
PloS one | VOL. 17
Jinsil Ham, et. al.Jinsil Ham ... Sidarta Ribeiro
23 Jun 2022
PloS one | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Online Speech Recognition Using Multichannel Parallel Acoustic Score Computation and Deep Neural Network (DNN)- Based Voice-Activity Detector

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences