Toward enriched decoding of mandarin spontaneous speech

Yu-Chih Deng,Yuan-Fu Liao,Yih-Ru Wang,Sin-Horng Chen

doi:10.1016/j.specom.2023.102983

Abstract

A deep neural network (DNN)-based automatic speech recognition (ASR) method for enriched decoding of Mandarin spontaneous speech is proposed. It adopts an enhanced approach over the baseline model built with factored time delay neural networks (TDNN-f) and rescored with RNNLM to first building a baseline system composed of a TDNN-f acoustic model (AM), a trigram language model (LM), and a recurrent neural network language model (RNNLM) to generate a word lattice. It then sequentially incorporates a multi-task Part-of-Speech-RNNLM (POS-RNNLM), a hierarchical prosodic model (HPM), and a reduplication-word LM (RLM) into the decoding process by expanding the word lattice and performing rescoring to improve recognition performance and enrich the decoding output with syntactic parameters of POS and punctuation (PM), prosodic tags of word-juncture break types and syllable prosodic states, and an edited recognition text with reduplication words being eliminated. Experimental results on the Mandarin conversational dialogue corpus (MCDC) showed that SER, CER, and WER of 13.2 %, 13.9 %, and 19.1 % were achieved when incorporating the POS-RNNLM and HPM into the baseline system. They represented relative SER, CER, and WER reductions of 7.7 %, 7.9 % and 5.0 % as comparing with those of the baseline system. Futhermore, the use of the RLM resulted in additional 3 %, 4.6 %, and 4.5 % relative SER, CER, and WER reductions through eliminating reduplication words.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Toward enriched decoding of mandarin spontaneous speech

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Similar Papers

A Comparison of RNN LM and FLM for Russian Speech Recognition
Irina Kipyatkova ... Alexey Karpov
-
Irina Kipyatkova, et. al.Irina Kipyatkova ... Alexey Karpov
01 Jan 2015
01 Jan 2015

A hybrid input-type recurrent neural network for LVCSR language modeling
Vataya Chunwijitra ... Chai Wutiwiwatchai
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2016
Vataya Chunwijitra, et. al.Vataya Chunwijitra ... Chai Wutiwiwatchai
08 Aug 2016
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2016

Training RNN language models on uncertain ASR hypotheses in limited data scenarios
Imran Sheikh ... Irina Illina
Computer Speech & Language | VOL. 83
Imran Sheikh, et. al.Imran Sheikh ... Irina Illina
20 Aug 2023
Computer Speech & Language | VOL. 83

Joint unsupervised adaptation of n-gram and RNN language models via LDA-based hybrid mixture modeling
Ryo Masumura ... Taichi Asami
-
Ryo Masumura, et. al.Ryo Masumura ... Taichi Asami
01 Dec 2017
01 Dec 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Toward enriched decoding of mandarin spontaneous speech

Abstract

Talk to us

Similar Papers

More From: Speech Communication