Trigram Language Model Research Articles

In this paper, a semisupervised speech data extraction method is presented and applied to create a new dataset designed for the development of fully bilingual Automatic Speech Recognition (ASR) systems for Basque and Spanish. The dataset is drawn from an extensive collection of Basque Parliament plenary sessions containing frequent code switchings. Since session minutes are not exact, only the most reliable speech segments are kept for training. To that end, we use phonetic similarity scores between nominal and recognized phone sequences. The process starts with baseline acoustic models trained on generic out-of-domain data, then iteratively updates the models with the extracted data and applies the updated models to refine the training dataset until the observed improvement between two iterations becomes small enough. A development dataset, involving five plenary sessions not used for training, has been manually audited for tuning and evaluation purposes. Cross-validation experiments (with 20 random partitions) have been carried out on the development dataset, using the baseline and the iteratively updated models. On average, Word Error Rate (WER) reduces from 16.57% (baseline) to 4.41% (first iteration) and further to 4.02% (second iteration), which corresponds to relative WER reductions of 73.4% and 8.8%, respectively. When considering only Basque segments, WER reduces on average from 16.57% (baseline) to 5.51% (first iteration) and further to 5.13% (second iteration), which corresponds to relative WER reductions of 66.7% and 6.9%, respectively. As a result of this work, a new bilingual Basque–Spanish resource has been produced based on Basque Parliament sessions, including 998 h of training data (audio segments + transcriptions), a development set (17 h long) designed for tuning and evaluation under a cross-validation scheme and a fully bilingual trigram language model.

We compared the performance of an automatic speech recognition system using n-gram language models, HMM acoustic models, as well as combinations of the two, with the word recognition performance of human subjects who either had access to only acoustic information, had information only about local linguistic context, or had access to a combination of both. All speech recordings used were taken from Japanese narration and spontaneous speech corpora.Humans have difficulty recognizing isolated words taken out of context, especially when taken from spontaneous speech, partly due to word-boundary coarticulation. Our recognition performance improves dramatically when one or two preceding words are added. Short words in Japanese mainly consist of post-positional particles (i.e. wa, ga, wo, ni, etc.), which are function words located just after content words such as nouns and verbs. So the predictability of short words is very high within the context of the one or two preceding words, and thus recognition of short words is drastically improved. Providing even more context further improves human prediction performance under text-only conditions (without acoustic signals). It also improves speech recognition, but the improvement is relatively small.Recognition experiments using an automatic speech recognizer were conducted under conditions almost identical to the experiments with humans. The performance of the acoustic models without any language model, or with only a unigram language model, were greatly inferior to human recognition performance with no context. In contrast, prediction performance using a trigram language model was superior or comparable to human performance when given a preceding and a succeeding word. These results suggest that we must improve our acoustic models rather than our language models to make automatic speech recognizers comparable to humans in recognition performance under conditions where the recognizer has limited linguistic context.

Trigram Language Model Research Articles

Related Topics

Articles published on Trigram Language Model

Toward enriched decoding of mandarin spontaneous speech

Semisupervised Speech Data Extraction from Basque Parliament Sessions and Validation on Fully Bilingual Basque–Spanish ASR

Spectral warping and data augmentation for low resource language ASR system under mismatched conditions

COTA 2.0: an Automatic Corrector of Tunisian Arabic Social Media Texts

Learner question's correctness assessment and a guided correction method: enhancing the user experience in an interactive online learning system.

Model design for grammatical error identification in software requirements specification using statistics and rule-based techniques

N-Gram Language Model based Continuous Voiced Odia Digit Recognition

A Statistical Model for Automatic Error Detection and Correction of Assamese Words

Words prediction based on N-gram model for free-text entry in electronic health records.

Toward Kurdish language processing: Experiments in collecting and processing the AsoSoft text corpus

형태소 발음변이를 고려한 음성인식 단위의 성능*

Language models, surprisal and fantasy in Slavic intercomprehension

Method to Overcome the Ambiguities in Shallow Parse and Transfer Machine Translation

A study of neural network Russian language models for automatic continuous speech recognition systems

A Generic and Scalable Architecture for a Large Acoustic Model and Large Vocabulary Speech Recognition Accelerator Using Logic on Memory

Implementation of a large-scale language model adaptation in a cloud environment

Effect of acoustic and linguistic contexts on human and machine speech recognition

Handwritten Chinese/Japanese Text Recognition Using Semi-Markov Conditional Random Fields

Candidate expansion algorithm based on weighted syllable confusion matrix for Mandarin LVCSR

Natural language processing with dynamic classification improves P300 speller accuracy and bit rate

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Trigram Language Model Research Articles

Related Topics

Articles published on Trigram Language Model

Toward enriched decoding of mandarin spontaneous speech

Semisupervised Speech Data Extraction from Basque Parliament Sessions and Validation on Fully Bilingual Basque–Spanish ASR

Spectral warping and data augmentation for low resource language ASR system under mismatched conditions

COTA 2.0: an Automatic Corrector of Tunisian Arabic Social Media Texts

Learner question's correctness assessment and a guided correction method: enhancing the user experience in an interactive online learning system.

Model design for grammatical error identification in software requirements specification using statistics and rule-based techniques

N-Gram Language Model based Continuous Voiced Odia Digit Recognition

A Statistical Model for Automatic Error Detection and Correction of Assamese Words

Words prediction based on N-gram model for free-text entry in electronic health records.

Toward Kurdish language processing: Experiments in collecting and processing the AsoSoft text corpus

형태소 발음변이를 고려한 음성인식 단위의 성능*

Language models, surprisal and fantasy in Slavic intercomprehension

Method to Overcome the Ambiguities in Shallow Parse and Transfer Machine Translation

A study of neural network Russian language models for automatic continuous speech recognition systems

A Generic and Scalable Architecture for a Large Acoustic Model and Large Vocabulary Speech Recognition Accelerator Using Logic on Memory

Implementation of a large-scale language model adaptation in a cloud environment

Effect of acoustic and linguistic contexts on human and machine speech recognition

Handwritten Chinese/Japanese Text Recognition Using Semi-Markov Conditional Random Fields

Candidate expansion algorithm based on weighted syllable confusion matrix for Mandarin LVCSR

Natural language processing with dynamic classification improves P300 speller accuracy and bit rate