Automatic Speech Recognition Models Research Articles

End-to-end (E2E) automatic speech recognition (ASR) models, which consist of deep learning models, are able to perform ASR tasks using a single neural network. These models should be trained using a large amount of data; however, collecting speech data which matches the targeted speech domain can be difficult, so speech data is often used that is not an exact match to the target domain, resulting in lower performance. In comparison to speech data, in-domain text data is much easier to obtain. Thus, traditional ASR systems use separately trained language models and HMM-based acoustic models. However, it is difficult to separate language information from an E2E ASR model because the model learns both acoustic and language information in an integrated manner, making it very difficult to create E2E ASR models for specialized target domain which are able to achieve sufficient recognition performance at a reasonable cost. In this paper, we propose a method of replacing the language information within pre-trained E2E ASR models in order to achieve adaptation to a target domain. This is achieved by deleting the “implicit” language information contained within the ASR model by subtracting the source-domain language model trained with a transcription of the ASR’s training data in a logarithmic domain. We then integrate a target domain language model through addition in the logarithmic domain. This subtraction and addition to replace of the language model is based on Bayes’ theorem. In our experiment, we first used two datasets of the Corpus of Spontaneous Japanese (CSJ) to evaluate the effectiveness of our method. We then we evaluated our method using the Japanese Newspaper Article Speech (JNAS) and CSJ corpora, which contain audio data from the read speech and spontaneous speech domain, respectively, to test the effectiveness of our proposed method at bridging the gap between these two language domains. Our results show that our proposed language model replacement method achieved better ASR performance than both non-adapted (baseline) ASR models and ASR models adapted using the conventional Shallow Fusion method.

This study examines the effectiveness of automatic speech recognition (ASR) for individuals with speech disorders, addressing the gap in performance between read and conversational ASR. We analyze the factors influencing this disparity and the effect of speech mode-specific training on ASR accuracy. Recordings of read and conversational speech from 27 individuals with various speech disorders were analyzed using both (a) one speaker-independent ASR system trained and optimized for typical speech and (b) multiple ASR models that were personalized to the speech of the participants with disordered speech. Word error rates were calculated for each speech model, read versus conversational, and subject. Linear mixed-effects models were used to assess the impact of speech mode and disorder severity on ASR accuracy. We investigated nine variables, classified as technical, linguistic, or speech impairment factors, for their potential influence on the performance gap. We found a significant performance gap between read and conversational speech in both personalized and unadapted ASR models. Speech impairment severity notably impacted recognition accuracy in unadapted models for both speech modes and in personalized models for read speech. Linguistic attributes of utterances were the most influential on accuracy, though atypical speech characteristics also played a role. Including conversational speech samples in model training notably improved recognition accuracy. We observed a significant performance gap in ASR accuracy between read and conversational speech for individuals with speech disorders. This gap was largely due to the linguistic complexity and unique characteristics of speech disorders in conversational speech. Training personalized ASR models using conversational speech significantly improved recognition accuracy, demonstrating the importance of domain-specific training and highlighting the need for further research into ASR systems capable of handling disordered conversational speech effectively.

Automatic Speech Recognition Models Research Articles

Related Topics

Articles published on Automatic Speech Recognition Models

Adversarial Attack and Defense for Commercial Black-box Chinese-English Speech Recognition Systems

Sub-layer feature fusion applied to transformer model for automatic speech recognition

Heterogeneous Hierarchical Fusion Network for Multimodal Sentiment Analysis in Real-World Environments

Speaker-Attributed Training for Multi-Speaker Speech Recognition Using Multi-Stage Encoders and Attention-Weighted Speaker Embedding

Effect of simulated hearing loss on automatic speech recognition for an android robot-patient.

Automatic speech recognition (ASR) for the diagnosis of pronunciation of speech sound disorders in Korean children.

Refining maritime Automatic Speech Recognition by leveraging synthetic speech

Cascaded cross-modal transformer for audio–textual classification

Decoupled structure for improved adaptability of end-to-end models

Perception of Phonological Assimilation by Neural Speech Recognition Models

Recognition of target domain Japanese speech using language model replacement

Enhancing Air Traffic Control Communication Systems with Integrated Automatic Speech Recognition: Models, Applications and Performance Evaluation.

Automatic transcription system for parliamentary debates in the context of assembly of the republic of Portugal

Smart reception: An artificial intelligence driven bangla language based receptionist system employing speech, speaker, and face recognition for automating reception services

Automatic Speech Recognition of Conversational Speech in Individuals With Disordered Speech.

End-To-End deep neural models for Automatic Speech Recognition for Polish Language

Advanced accent/dialect identification and accentedness assessment with multi-embedding models and automatic speech recognition.

Automatic speech recognition for Indonesian medical dictation in cloud environment

System dedicated to Polish Automatic Speech Recognition - overview of solutions

Cluster-Based Pairwise Contrastive Loss for Noise-Robust Speech Recognition.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Automatic Speech Recognition Models Research Articles

Related Topics

Articles published on Automatic Speech Recognition Models

Adversarial Attack and Defense for Commercial Black-box Chinese-English Speech Recognition Systems

Sub-layer feature fusion applied to transformer model for automatic speech recognition

Heterogeneous Hierarchical Fusion Network for Multimodal Sentiment Analysis in Real-World Environments

Speaker-Attributed Training for Multi-Speaker Speech Recognition Using Multi-Stage Encoders and Attention-Weighted Speaker Embedding

Effect of simulated hearing loss on automatic speech recognition for an android robot-patient.

Automatic speech recognition (ASR) for the diagnosis of pronunciation of speech sound disorders in Korean children.

Refining maritime Automatic Speech Recognition by leveraging synthetic speech

Cascaded cross-modal transformer for audio–textual classification

Decoupled structure for improved adaptability of end-to-end models

Perception of Phonological Assimilation by Neural Speech Recognition Models

Recognition of target domain Japanese speech using language model replacement

Enhancing Air Traffic Control Communication Systems with Integrated Automatic Speech Recognition: Models, Applications and Performance Evaluation.

Automatic transcription system for parliamentary debates in the context of assembly of the republic of Portugal

Smart reception: An artificial intelligence driven bangla language based receptionist system employing speech, speaker, and face recognition for automating reception services

Automatic Speech Recognition of Conversational Speech in Individuals With Disordered Speech.

End-To-End deep neural models for Automatic Speech Recognition for Polish Language

Advanced accent/dialect identification and accentedness assessment with multi-embedding models and automatic speech recognition.

Automatic speech recognition for Indonesian medical dictation in cloud environment

System dedicated to Polish Automatic Speech Recognition - overview of solutions

Cluster-Based Pairwise Contrastive Loss for Noise-Robust Speech Recognition.