Automatic Speech Recognition Models Research Articles

End-to-end (E2E) automatic speech recognition (ASR) models, which consist of deep learning models, are able to perform ASR tasks using a single neural network. These models should be trained using a large amount of data; however, collecting speech data which matches the targeted speech domain can be difficult, so speech data is often used that is not an exact match to the target domain, resulting in lower performance. In comparison to speech data, in-domain text data is much easier to obtain. Thus, traditional ASR systems use separately trained language models and HMM-based acoustic models. However, it is difficult to separate language information from an E2E ASR model because the model learns both acoustic and language information in an integrated manner, making it very difficult to create E2E ASR models for specialized target domain which are able to achieve sufficient recognition performance at a reasonable cost. In this paper, we propose a method of replacing the language information within pre-trained E2E ASR models in order to achieve adaptation to a target domain. This is achieved by deleting the “implicit” language information contained within the ASR model by subtracting the source-domain language model trained with a transcription of the ASR’s training data in a logarithmic domain. We then integrate a target domain language model through addition in the logarithmic domain. This subtraction and addition to replace of the language model is based on Bayes’ theorem. In our experiment, we first used two datasets of the Corpus of Spontaneous Japanese (CSJ) to evaluate the effectiveness of our method. We then we evaluated our method using the Japanese Newspaper Article Speech (JNAS) and CSJ corpora, which contain audio data from the read speech and spontaneous speech domain, respectively, to test the effectiveness of our proposed method at bridging the gap between these two language domains. Our results show that our proposed language model replacement method achieved better ASR performance than both non-adapted (baseline) ASR models and ASR models adapted using the conventional Shallow Fusion method.

In today's world, numerous applications integral to various facets of daily life include automatic speech recognition methods. Thus, the development of a successful automatic speech recognition system can significantly augment the convenience of people's daily routines. While many automatic speech recognition systems have been established for widely spoken languages like English, there has been insufficient progress in developing such systems for less common languages such as Turkish. Moreover, due to its agglutinative structure, designing a speech recognition system for Turkish presents greater challenges compared to other language groups. Therefore, our study focused on proposing deep learning models for automatic speech recognition in Turkish, complemented by the integration of a language model. In our study, deep learning models were formulated by incorporating convolutional neural networks, gated recurrent units, long short-term memories, and transformer layers. The Zemberek library was employed to craft the language model to improve system performance. Furthermore, the Bayesian optimization method was applied to fine-tune the hyper-parameters of the deep learning models. To evaluate the model's performance, standard metrics widely used in automatic speech recognition systems, specifically word error rate and character error rate scores, were employed. Upon reviewing the experimental results, it becomes evident that when optimal hyper-parameters are applied to models developed with various layers, the scores are as follows: Without the use of a language model, the Turkish Microphone Speech Corpus dataset yields scores of 22.2 -word error rate and 14.05-character error rate, while the Turkish Speech Corpus dataset results in scores of 11.5 -word error rate and 4.15 character error rate. Upon incorporating the language model, notable improvements were observed. Specifically, for the Turkish Microphone Speech Corpus dataset, the word error rate score decreased to 9.85, and the character error rate score lowered to 5.35. Similarly, the word error rate score improved to 8.4, and the character error rate score decreased to 2.7 for the Turkish Speech Corpus dataset. These results demonstrate that our model outperforms the studies found in the existing literature.

Automatic Speech Recognition Models Research Articles

Related Topics

Articles published on Automatic Speech Recognition Models

Speaker-Attributed Training for Multi-Speaker Speech Recognition Using Multi-Stage Encoders and Attention-Weighted Speaker Embedding

Effect of simulated hearing loss on automatic speech recognition for an android robot-patient.

Automatic speech recognition (ASR) for the diagnosis of pronunciation of speech sound disorders in Korean children.

Refining maritime Automatic Speech Recognition by leveraging synthetic speech

Cascaded cross-modal transformer for audio–textual classification

Perception of Phonological Assimilation by Neural Speech Recognition Models

Recognition of target domain Japanese speech using language model replacement

Enhancing Air Traffic Control Communication Systems with Integrated Automatic Speech Recognition: Models, Applications and Performance Evaluation.

Automatic transcription system for parliamentary debates in the context of assembly of the republic of Portugal

Smart reception: An artificial intelligence driven bangla language based receptionist system employing speech, speaker, and face recognition for automating reception services

Automatic Speech Recognition of Conversational Speech in Individuals With Disordered Speech.

Decoupled structure for improved adaptability of end-to-end models

End-To-End deep neural models for Automatic Speech Recognition for Polish Language

Advanced accent/dialect identification and accentedness assessment with multi-embedding models and automatic speech recognition.

Automatic speech recognition for Indonesian medical dictation in cloud environment

System dedicated to Polish Automatic Speech Recognition - overview of solutions

Cluster-Based Pairwise Contrastive Loss for Noise-Robust Speech Recognition.

Effect of Face Masks on Automatic Speech Recognition Accuracy for Mandarin

Customized deep learning based Turkish automatic speech recognition system supported by language model.

A closer look at reinforcement learning-based automatic speech recognition

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Automatic Speech Recognition Models Research Articles

Related Topics

Articles published on Automatic Speech Recognition Models

Speaker-Attributed Training for Multi-Speaker Speech Recognition Using Multi-Stage Encoders and Attention-Weighted Speaker Embedding

Effect of simulated hearing loss on automatic speech recognition for an android robot-patient.

Automatic speech recognition (ASR) for the diagnosis of pronunciation of speech sound disorders in Korean children.

Refining maritime Automatic Speech Recognition by leveraging synthetic speech

Cascaded cross-modal transformer for audio–textual classification

Perception of Phonological Assimilation by Neural Speech Recognition Models

Recognition of target domain Japanese speech using language model replacement

Enhancing Air Traffic Control Communication Systems with Integrated Automatic Speech Recognition: Models, Applications and Performance Evaluation.

Automatic transcription system for parliamentary debates in the context of assembly of the republic of Portugal

Smart reception: An artificial intelligence driven bangla language based receptionist system employing speech, speaker, and face recognition for automating reception services

Automatic Speech Recognition of Conversational Speech in Individuals With Disordered Speech.

Decoupled structure for improved adaptability of end-to-end models

End-To-End deep neural models for Automatic Speech Recognition for Polish Language

Advanced accent/dialect identification and accentedness assessment with multi-embedding models and automatic speech recognition.

Automatic speech recognition for Indonesian medical dictation in cloud environment

System dedicated to Polish Automatic Speech Recognition - overview of solutions

Cluster-Based Pairwise Contrastive Loss for Noise-Robust Speech Recognition.

Effect of Face Masks on Automatic Speech Recognition Accuracy for Mandarin

Customized deep learning based Turkish automatic speech recognition system supported by language model.

A closer look at reinforcement learning-based automatic speech recognition