Large Vocabulary Continuous Speech Recognition System Research Articles

The use of exemplar-based methods, such as support vector machines (SVMs), k-nearest neighbors (kNNs) and sparse representations (SRs), in speech recognition has thus far been limited. Exemplar-based techniques utilize information about individual training examples and are computationally expensive, making it particularly difficult to investigate these methods on large-vocabulary continuous speech recognition (LVCSR) tasks. While research in LVCSR provides a good testbed to tackle real-world speech recognition problems, research in this area suffers from two main drawbacks. First, the overall complexity of an LVCSR system makes error analysis quite difficult. Second, exploring new research ideas on LVCSR tasks involves training and testing state-of-the-art LVCSR systems, which can render a large turnaround time. This makes a small vocabulary task such as TIMIT more appealing. TIMIT provides a phonetically rich and hand-labeled corpus that allows easy insight into new algorithms. However, research ideas explored for small vocabulary tasks do not always provide gains on LVCSR systems. In this paper, we combine the advantages of using both small and large vocabulary tasks by taking well-established techniques used in LVCSR systems and applying them on TIMIT to establish a new baseline. We then utilize these existing LVCSR techniques in creating a novel set of exemplar-based sparse representation (SR) features. Using these existing LVCSR techniques, we achieve a phonetic error rate (PER) of 19.4% on the TIMIT task. The additional use of SR features reduce the PER to 18.6%. We then explore applying the SR features to a large vocabulary Broadcast News task, where we achieve a 0.3% absolute reduction in word error rate (WER).

Read full abstract

In Korean writing, a space is placed between two adjacent word-phrases, each of which generally corresponds to two or three words in English in a semantic sense. If the word-phrase is used as a recognition unit for Korean large vocabulary continuous speech recognition (LVCSR), the out-of-vocabulary (OOV) rate becomes very large. If a morpheme or a syllable is used instead, a severe inter-morpheme coarticulation problem arises due to short morphemes. We propose to use a merged morpheme as the recognition unit and pronunciation-dependent entries in a language model (LM) so that we can reduce such difficulties and incorporate the between-word phonology rule into the decoding algorithm of a Korean LVCSR system. Starting from the original morpheme units defined in the Korean morphology, we merge pairs of short and frequent morphemes into larger units by using a rule-based method and a statistical method. We define the merged morpheme unit as word and use it as the recognition unit. The performance of the system was evaluated in two business-related tasks: a read speech recognition task and a broadcast news transcription task. The OOV rate was reduced to a level comparable to that of American English in both tasks. In the read speech recognition task, with a 32k vocabulary and a word-based trigram LM computed from a newspaper text corpus, the word error rate (WER) of the baseline system was reduced from 25.0% to 20.0% by cross-word modeling and pronunciation-dependent language modeling, and finally to 15.5% by increasing speech database and text corpora. For the broadcast news transcription task, we showed that the statistical method relatively reduced the WER of the baseline system without morpheme merging by 3.4% and both of the proposed methods yielded similar performance. Applying all the proposed techniques, we achieved 17.6% WER for clean speech and 27.7% for noisy speech.

Read full abstract

Large Vocabulary Continuous Speech Recognition System Research Articles

Related Topics

Articles published on Large Vocabulary Continuous Speech Recognition System

Bangladeshi Bangla speech corpus for automatic speech recognition research

Modelling Semantic Context of OOV Words in Large Vocabulary Continuous Speech Recognition

The use of pitch in Large-Vocabulary Continuous Speech Recognition System

System for Automatic Transcription of Sessions of the Polish Senate

Lexicon optimization based on discriminative learning for automatic speech recognition of agglutinative language

Issues in developing LVCSR System for Dravidian Languages: An Exhaustive Case Study for Tamil

Language model cross adaptation for LVCSR system combination

Exemplar-Based Sparse Representation Features: From TIMIT to LVCSR

Lecture speech recognition using discrete‐mixture HMMs

Lexical units for Thai LVCSR

Mutual Information Based Dynamic Integration of Multiple Feature Streams for Robust Real-Time LVCSR

A Syllable Lattice Approach to Speaker Verification

Morpheme-Based Modeling of Pronunciation Variation for Large Vocabulary Continuous Speech Recognition in Korean

Generating search query in unsupervised language model adaptaion using WWW

Stream Weight Training Based on MCE for Audio-Visual LVCSR

Large-vocabulary continuous speech recognition using linear lexicon search and 1-best approximation tree-structured lexicon search

A large-vocabulary continuous speech recognition system for Hindi

A large-vocabulary continuous speech recognition system for Hindi

Acoustic models of the elderly for large‐vocabulary continuous speech recognition

Korean large vocabulary continuous speech recognition with morpheme-based recognition units

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Large Vocabulary Continuous Speech Recognition System Research Articles

Related Topics

Articles published on Large Vocabulary Continuous Speech Recognition System

Bangladeshi Bangla speech corpus for automatic speech recognition research

Modelling Semantic Context of OOV Words in Large Vocabulary Continuous Speech Recognition

The use of pitch in Large-Vocabulary Continuous Speech Recognition System

System for Automatic Transcription of Sessions of the Polish Senate

Lexicon optimization based on discriminative learning for automatic speech recognition of agglutinative language

Issues in developing LVCSR System for Dravidian Languages: An Exhaustive Case Study for Tamil

Language model cross adaptation for LVCSR system combination

Exemplar-Based Sparse Representation Features: From TIMIT to LVCSR

Lecture speech recognition using discrete‐mixture HMMs

Lexical units for Thai LVCSR

Mutual Information Based Dynamic Integration of Multiple Feature Streams for Robust Real-Time LVCSR

A Syllable Lattice Approach to Speaker Verification

Morpheme-Based Modeling of Pronunciation Variation for Large Vocabulary Continuous Speech Recognition in Korean

Generating search query in unsupervised language model adaptaion using WWW

Stream Weight Training Based on MCE for Audio-Visual LVCSR

Large-vocabulary continuous speech recognition using linear lexicon search and 1-best approximation tree-structured lexicon search

A large-vocabulary continuous speech recognition system for Hindi

A large-vocabulary continuous speech recognition system for Hindi

Acoustic models of the elderly for large‐vocabulary continuous speech recognition

Korean large vocabulary continuous speech recognition with morpheme-based recognition units