Word-based Language Model Research Articles

We describe a novel way to implement subword language models in speech recognition systems based on weighted finite state transducers, hidden Markov models, and deep neural networks. The acoustic models are built on graphemes in a way that no pronunciation dictionaries are needed, and they can be used together with any type of subword language model, including character models. The advantages of short subword units are good lexical coverage, reduced data sparsity, and avoiding vocabulary mismatches in adaptation. Moreover, constructing neural network language models (NNLMs) is more practical, because the input and output layers are small. We also propose methods for combining the benefits of different types of language model units by reconstructing and combining the recognition lattices. We present an extensive evaluation of various subword units on speech datasets of four languages: Finnish, Swedish, Arabic, and English. The results show that the benefits of short subwords are even more consistent with NNLMs than with traditional n-gram language models. Combination across different acoustic models and language models with various units improve the results further. For all the four datasets we obtain the best results published so far. Our approach performs well even for English, where the phoneme-based acoustic models and word-based language models typically dominate: The phoneme-based baseline performance can be reached and improved by 4% using graphemes only when several grapheme-based models are combined. Furthermore, combining both grapheme and phoneme models yields the state-of-the-art error rate of 15.9% for the MGB 2018 dev17b test. For all four languages we also show that the language models perform reasonably well when only limited training data is available.

Read full abstract

In order to obtain real-time controlling dynamics in air traffic system, a framework is proposed to introduce and process air traffic control (ATC) speech via radiotelephony communication. An automatic speech recognition (ASR) and controlling instruction understanding (CIU)-based pipeline is designed to convert the ATC speech into ATC related elements, i.e., controlling intent and parameters. A correction procedure is also proposed to improve the reliability of the information obtained by the proposed framework. In the ASR model, acoustic model (AM), pronunciation model (PM), and phoneme- and word-based language model (LM) are proposed to unify multilingual ASR into one model. In this work, based on their tasks, the AM and PM are defined as speech recognition and machine translation problems respectively. Two-dimensional convolution and average-pooling layers are designed to solve special challenges of ASR in ATC. An encoder–decoder architecture-based neural network is proposed to translate phoneme labels into word labels, which achieves the purpose of ASR. In the CIU model, a recurrent neural network-based joint model is proposed to detect the controlling intent and label the controlling parameters, in which the two tasks are solved in one network to enhance the performance with each other based on ATC communication rules. The ATC speech is now converted into ATC related elements by the proposed ASR and CIU model. To further improve the accuracy of the sensing framework, a correction procedure is proposed to revise minor mistakes in ASR decoding results based on the flight information, such as flight plan, ADS-B. The proposed models are trained using real operating data and applied to a civil aviation airport in China to evaluate their performance. Experimental results show that the proposed framework can obtain real-time controlling dynamics with high performance, only 4% word-error rate. Meanwhile, the decoding efficiency can also meet the requirement of real-time applications, i.e., an average 0.147 real time factor. With the proposed framework and obtained traffic dynamics, current ATC applications can be accomplished with higher accuracy. In addition, the proposed ASR pipeline has high reusability, which allows us to apply it to other controlling scenes and languages with minor changes.

Read full abstract

Word-based Language Model Research Articles

Related Topics

Articles published on Word-based Language Model

An End-to-End Transformer-Based Automatic Speech Recognition for Qur’an Reciters

A Rule-Based Grapheme-to-Phoneme Conversion System

Automatic Component Prediction for Issue Reports Using Fine-Tuned Pretrained Language Models

Arabic speech recognition using end‐to‐end deep learning

Advances in subword-based HMM-DNN speech recognition across languages

Morphologically motivated word classes for very large vocabulary speech recognition of Finnish and Estonian

Latent Relation Language Models

Real-time Controlling Dynamics Sensing in Air Traffic System.

A Comparison of Phrase Based and Word based Language Model for Punjabi

Integration of complex language models in ASR and LU systems

Design of language models at various phases of Tamil speech recognition system

Exploiting Morphology and Local Word Reordering in English-to-Turkish Phrase-Based Statistical Machine Translation

Comparison of performance of enhanced morpheme-based language model with different word-based language models for improving the performance of Tamil speech recognition system

A bit progress on word-based language model

Modelling Highly Inflected Slovenian Language

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Word-based Language Model Research Articles

Related Topics

Articles published on Word-based Language Model

An End-to-End Transformer-Based Automatic Speech Recognition for Qur’an Reciters

A Rule-Based Grapheme-to-Phoneme Conversion System

Automatic Component Prediction for Issue Reports Using Fine-Tuned Pretrained Language Models

Arabic speech recognition using end‐to‐end deep learning

Advances in subword-based HMM-DNN speech recognition across languages

Morphologically motivated word classes for very large vocabulary speech recognition of Finnish and Estonian

Latent Relation Language Models

Real-time Controlling Dynamics Sensing in Air Traffic System.

A Comparison of Phrase Based and Word based Language Model for Punjabi

Integration of complex language models in ASR and LU systems

Design of language models at various phases of Tamil speech recognition system

Exploiting Morphology and Local Word Reordering in English-to-Turkish Phrase-Based Statistical Machine Translation

Comparison of performance of enhanced morpheme-based language model with different word-based language models for improving the performance of Tamil speech recognition system

A bit progress on word-based language model

Modelling Highly Inflected Slovenian Language