Correct phoneme sequence estimation using recurrent neural network for spoken term detection

Naoki Sawada,Hiromitsu Nishizaki

doi:10.1121/1.4969528

Abstract

This paper proposes a correct phoneme sequence estimation method that uses a recurrent neural network (RNN)-based framework for spoken term detection (STD). It is important to reduce automatic speech recognition (ASR) errors to obtain good STD results. Therefore, we use a long short-term memory (LSTM), which is one of an RNN architecture, for estimating a correct phoneme sequence of an utterance from phoneme-based transcriptions produced by ASR systems in post-processing of ASR. We prepare two types of LSTM-based phoneme estimators: one is trained with a single ASR system's N-best output and the other is trained with multiple ASR systems' 1-best outputs. For an experiment on a correct phoneme estimation task, these LSTM-based estimators could generate better phoneme-based N-best transcriptions rather than the best ASR system's ones. In particular, the estimator trained with multiple ASR systems' outputs worked well on the estimation task. Besides, the STD system with the LSTM estimator drastically improved STD performance compared to our previously proposed STD system with a conditional random field-based phoneme estimator.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Correct phoneme sequence estimation using recurrent neural network for spoken term detection

Abstract

Talk to us

Similar Papers

More From: Journal of the Acoustical Society of America

Lead the way for us

Journal: Journal of the Acoustical Society of America	Publication Date: Oct 1, 2016
Citations: 1

Similar Papers

Experimental studies on effect of speaking mode on spoken term detection
Kallola Rout ... Pappagari Raghavendra Reddy
-
Kallola Rout, et. al.Kallola Rout ... Pappagari Raghavendra Reddy
01 Feb 2015
01 Feb 2015

Improved dynamic match phone lattice search for Persian spoken term detection system in online and offline applications
Shima Tabibian ... Babak Nasersharif
International Journal of Speech Technology | VOL. 22
Shima Tabibian, et. al.Shima Tabibian ... Babak Nasersharif
25 Jan 2019
International Journal of Speech Technology | VOL. 22

Open-vocabulary spoken term detection using graphone-based hybrid recognition systems
Murat Akbacak ... Andreas Stolcke
-
Murat Akbacak, et. al.Murat Akbacak ... Andreas Stolcke
01 Mar 2008
01 Mar 2008

A Multi-source Knowledge Fusion Strategy to Improve Confidence Measure in a Lattice-based Spoken Term Detection System
Xinglong Gao
Journal of Information and Computational Science | VOL. 11
Xinglong GaoXinglong Gao
20 Jul 2014
Journal of Information and Computational Science | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Correct phoneme sequence estimation using recurrent neural network for spoken term detection

Abstract

Talk to us

Similar Papers

More From: Journal of the Acoustical Society of America