LVCSR System Research Articles

Recently, several multi-layer perceptron (MLP)-based front-ends have been developed and used for Mandarin speech recognition, often showing significant complementary properties to conventional spectral features. Although widely used in multiple Mandarin systems, no systematic comparison of all the different approaches as well as their scalability has been proposed. The novelty of this correspondence is mainly experimental. In this work, all the MLP front-ends recently developed at multiple sites are described and compared in a systematic manner on a 100 hours setup. The study covers the two main directions along which the MLP features have evolved: the use of different input representations to the MLP and the use of more complex MLP architectures beyond the three-layer perceptron. The results are analyzed in terms of confusion matrices and the paper discusses a number of novel findings that the comparison reveals. Furthermore, the two best front-ends used in the GALE 2008 evaluation, referred as MLP1 and MLP2, are studied in a more complex LVCSR system in order to investigate their scalability in terms of the amount of training data (from 100 hours to 1600 hours) and the parametric system complexity (maximum likelihood versus discriminative training, speaker adaptative training, lattice level combination). Results on 5 hours of evaluation data from the GALE project reveal that the MLP features consistently produce improvements in the range of 15%-23% relative at the different steps of a multipass system when compared to mel-frequency cepstral coefficient (MFCC) and PLP features, suggesting that the improvements scale with the amount of data and with the complexity of the system. The integration of those features into the GALE 2008 evaluation system provide very competitive performances compared to other Mandarin systems.

Read full abstract

LVCSR systems are usually based on continuous density HMMs, which are typically implemented using Gaussian mixture distributions. Such statistical modeling systems tend to operate slower than real-time, largely because of the heavy computational overhead of the likelihood evaluation. The objective of our research is to investigate approximate methods that can substantially reduce the computational cost in likelihood evaluation without obviously degrading the recognition accuracy. In this paper, the most common techniques to speed up the likelihood computation are classified into three categories, namely machine optimization, model optimization, and algorithm optimization. Each category is surveyed and summarized by describing and analyzing the basic ideas of the corresponding techniques. The distribution of the numerical values of Gaussian mixtures within a GMM model are evaluated and analyzed to show that computations of some Gaussians are unnecessary and can thus be eliminated. Two commonly used techniques for likelihood approximation, namely VQ-based Gaussian selection and partial distance elimination, are analyzed in detail. Based on the analyses, a fast likelihood computation approach called dynamic Gaussian selection (DGS) is proposed. DGS approach is a one-pass search technique which generates a dynamic shortlist of Gaussians for each state during the procedure of likelihood computation. In principle, DGS is an extension of both techniques of partial distance elimination and best mixture prediction, and it does not require additional memory for the storage of Gaussian shortlists. DGS algorithm has been implemented by modifying the likelihood computation procedure in HTK 3.4 system. Experimental results on TIMIT and WSJ0 corpora indicate that this approach can speed up the likelihood computation significantly without introducing apparent additional recognition error.

Read full abstract

LVCSR System Research Articles

Related Topics

Articles published on LVCSR System

The Effect of Tone Modeling in Vietnamese LVCSR System

Acceleration Strategies for Speech Recognition Based on Deep Neural Networks

Building a speech repository for a Serbian LVCSR system

Transcribing Mandarin Broadcast Speech Using Multi-Layer Perceptron Acoustic Features

Hierarchical and parallel processing of auditory and modulation frequencies for automatic speech recognition

Unsupervised Speaker Adaptation Using Speaker-Class Models for Lecture Speech Recognition

Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition

Improving Keyword Recognition of Spoken Queries by Combining Multiple Speech Recognizer's Outputs for Speech-driven WEB Retrieval Task

An Unsupervised Speaker Adaptation Method for Lecture-Style Spontaneous Speech Recognition Using Multiple Recognition Systems

Spoken language recognition-a step toward multilinguality in speech processing

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

LVCSR System Research Articles

Related Topics

Articles published on LVCSR System

The Effect of Tone Modeling in Vietnamese LVCSR System

Acceleration Strategies for Speech Recognition Based on Deep Neural Networks

Building a speech repository for a Serbian LVCSR system

Transcribing Mandarin Broadcast Speech Using Multi-Layer Perceptron Acoustic Features

Hierarchical and parallel processing of auditory and modulation frequencies for automatic speech recognition

Unsupervised Speaker Adaptation Using Speaker-Class Models for Lecture Speech Recognition

Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition

Improving Keyword Recognition of Spoken Queries by Combining Multiple Speech Recognizer's Outputs for Speech-driven WEB Retrieval Task

An Unsupervised Speaker Adaptation Method for Lecture-Style Spontaneous Speech Recognition Using Multiple Recognition Systems

Spoken language recognition-a step toward multilinguality in speech processing