Hierarchical search for large-vocabulary conversational speech recognition: working toward a solution to the decoding problem

N. Deshmukh,J. Picone,A. Ganapathiraju

doi:10.1109/79.790985

Abstract

Large vocabulary continuous speech recognition (LVCSR) systems have advanced significantly due to the ability to handle extremely large problem spaces in fairly small amounts of memory. The article introduces the search problem, discusses in detail a typical implementation of a search engine, and demonstrates the efficacy of this approach on a range of problems. The approach presented is scalable across a wide range of applications. It is designed to address research needs, where a premium is placed on the flexibility of the system architecture, and the needs of application prototypes, which require near-real-time speed without a great sacrifice in word error rate (WER). One major area of focus for researchers is the development of real-time systems. With only minor degradations in performance (typically, no more than a 25% increase in WER), the systems described in this article can be transformed into systems that operate at 10/spl times/RT or less. There are four active areas of research related to this problem. First, more intelligent pruning algorithms that prune the search space more heavily are required. Look-ahead and N-best strategies at all levels of the system are key to achieving such large reductions in the search space. Second, multi-pass systems that perform a quick search using a simple system, and then rescore only the N-best resulting hypotheses using better models are very popular for real-time implementation. Third, since much of the computation in these systems is devoted to acoustic model processing, fast-matching strategies within the acoustic model are important. Finally, since Gaussian evaluation at each state in the system is a major consumer of CPU time, vector quantization-like approaches that enable one to compute only a small number of Gaussians per frame are proven to be successful. In some sense, the Viterbi (1967) based system presented represents only one path through this continuum of recognition search strategies.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Hierarchical search for large-vocabulary conversational speech recognition: working toward a solution to the decoding problem

Abstract

Talk to us

Similar Papers

More From: IEEE Signal Processing Magazine

Lead the way for us

Journal: IEEE Signal Processing Magazine	Publication Date: Jan 1, 1999
Citations: 51

Similar Papers

Fast Likelihood Computation in Speech Recognition using Matrices
Mrugesh R Gajjar ... T V Sreenivas
Journal of Signal Processing Systems | VOL. 70
Mrugesh R Gajjar, et. al.Mrugesh R Gajjar ... T V Sreenivas
02 Oct 2012
Journal of Signal Processing Systems | VOL. 70

Low-Dimensional Bottleneck Features for On-Device Continuous Speech Recognition
David B Ramsay ... Dominik Roblek
-
David B Ramsay, et. al.David B Ramsay ... Dominik Roblek
15 Sep 2019
15 Sep 2019

Lossless compression of language model structure and word identifiers
B Raj ... E.W.D Whittaker
-
B Raj, et. al.B Raj ... E.W.D Whittaker
06 Apr 2003
06 Apr 2003

Using different acoustic, lexical and language modeling units for ASR of an under-resourced language – Amharic
Martha Yifiru Tachbelie ... Laurent Besacier
Speech Communication | VOL. 56
Martha Yifiru Tachbelie, et. al.Martha Yifiru Tachbelie ... Laurent Besacier
14 Feb 2013
Speech Communication | VOL. 56

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hierarchical search for large-vocabulary conversational speech recognition: working toward a solution to the decoding problem

Abstract

Talk to us

Similar Papers

More From: IEEE Signal Processing Magazine