Automatic Speech Recognition Results Research Articles

Transcribing disordered speech can be useful when diagnosing motor speech disorders such as primary progressive apraxia of speech (PPAOS), who have sound additions, deletions, and substitutions, or distortions and/or slow, segmented speech. Since transcribing speech can be a laborious process and requires an experienced listener, using automatic speech recognition (ASR) systems for diagnosis and treatment monitoring is appealing. This study evaluated the efficacy of a readily available ASR system (wav2vec 2.0) in transcribing speech of PPAOS patients to determine if the word error rate (WER) output by the ASR can differentiate between healthy speech and PPAOS and/or among its subtypes, whether WER correlates with AOS severity, and how the ASR's errors compare to those noted in manual transcriptions. Forty-five patients with PPAOS and 22 healthy controls were recorded repeating 13 words, 3 times each, which were transcribed manually and using wav2vec 2.0. The WER and phonetic and prosodic speech errors were compared between groups, and ASR results were compared against manual transcriptions. Mean overall WER was 0.88 for patients and 0.33 for controls. WER significantly correlated with AOS severity and accurately distinguished between patients and controls but not between AOS subtypes. The phonetic and prosodic errors from the ASR transcriptions were also unable to distinguish between subtypes, whereas errors calculated from human transcriptions were. There was poor agreement in the number of phonetic and prosodic errors between the ASR and human transcriptions. This study demonstrates that ASR can be useful in differentiating healthy from disordered speech and evaluating PPAOS severity but does not distinguish PPAOS subtypes. ASR transcriptions showed weak agreement with human transcriptions; thus, ASR may be a useful tool for the transcription of speech in PPAOS, but the research questions posed must be carefully considered within the context of its limitations. https://doi.org/10.23641/asha.26359417.

Read full abstract

The performance of automatic speech recognition (ASR) may be degraded when accented speech is recognized because the speech has some linguistic differences from standard speech. Conventional accented speech recognition studies have utilized the accent embedding method, in which the accent embedding features are directly fed into the ASR network. Although the method improves the performance of accented speech recognition, it has some restrictions, such as increasing the computational costs. This study proposes an efficient method of training the ASR model for accented speech in a domain adversarial way based on the Domain Adversarial Neural Network (DANN). The DANN plays a role as a domain adaptation in which the training data and test data have different distributions. Thus, our approach is expected to construct a reliable ASR model for accented speech by reducing the distribution differences between accented speech and standard speech. DANN has three sub-networks: the feature extractor, the domain classifier, and the label predictor. To adjust the DANN for accented speech recognition, we constructed these three sub-networks independently, considering the characteristics of accented speech. In particular, we used an end-to-end framework based on Connectionist Temporal Classification (CTC) to develop the label predictor, a very important module that directly affects ASR results. To verify the efficiency of the proposed approach, we conducted several experiments of accented speech recognition for four English accents including Australian, Canadian, British (England), and Indian accents. The experimental results showed that the proposed DANN-based model outperformed the baseline model for all accents, indicating that the end-to-end domain adversarial training effectively reduced the distribution differences between accented speech and standard speech.

Read full abstract

Automatic Speech Recognition Results Research Articles

Related Topics

Articles published on Automatic Speech Recognition Results

Lexical Error Guard: Leveraging Large Language Models for Enhanced ASR Error Correction

Automatic Speech Recognition in Primary Progressive Apraxia of Speech.

Is the Same Performance Really the Same?: Understanding How Listeners Perceive ASR Results Differently According to the Speaker's Accent

Deep Neural Networks-based Classification Methodologies of Speech, Audio and Music, and its Integration for Audio Metadata Tagging

Accented Speech Recognition Based on End-to-End Domain Adversarial Training of Neural Networks

Multi-Angle Lipreading with Angle Classification-Based Feature Extraction and Its Application to Audio-Visual Speech Recognition

The relationship between word error rate and perceptual judgment

Reviewing Speech Input with Audio

Speech Enhancement Based on Denoising Autoencoder With Multi-Branched Encoders

Automatic perceptual judgment using neural networks

Building and evaluation of a real room impulse response dataset

Speech-driven mobile games for speech therapy: User experiences and feasibility

Predicting speech intelligibility with deep neural networks

Spoken Term Detection Using Spoken Document Index Based on Keywords Collected from Automatic Speech Recognition Result

음성인식 기반 응급상황관제

Enhancements in Statistical Spoken Language Translation by De-normalization of ASR Results

Using automatic speech recognition to identify pediatric speech errors

Improving Domain-independent Cloud-Based Speech Recognition with Domain-Dependent Phonetic Post-Processing

On Grounding Natural Kind Terms in Human-Robot Communication

Experimental research of influence of acoustic noises of different types on results of automatic speech recognition

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Automatic Speech Recognition Results Research Articles

Related Topics

Articles published on Automatic Speech Recognition Results

Lexical Error Guard: Leveraging Large Language Models for Enhanced ASR Error Correction

Automatic Speech Recognition in Primary Progressive Apraxia of Speech.

Is the Same Performance Really the Same?: Understanding How Listeners Perceive ASR Results Differently According to the Speaker's Accent

Deep Neural Networks-based Classification Methodologies of Speech, Audio and Music, and its Integration for Audio Metadata Tagging

Accented Speech Recognition Based on End-to-End Domain Adversarial Training of Neural Networks

Multi-Angle Lipreading with Angle Classification-Based Feature Extraction and Its Application to Audio-Visual Speech Recognition

The relationship between word error rate and perceptual judgment

Reviewing Speech Input with Audio

Speech Enhancement Based on Denoising Autoencoder With Multi-Branched Encoders

Automatic perceptual judgment using neural networks

Building and evaluation of a real room impulse response dataset

Speech-driven mobile games for speech therapy: User experiences and feasibility

Predicting speech intelligibility with deep neural networks

Spoken Term Detection Using Spoken Document Index Based on Keywords Collected from Automatic Speech Recognition Result

음성인식 기반 응급상황관제

Enhancements in Statistical Spoken Language Translation by De-normalization of ASR Results

Using automatic speech recognition to identify pediatric speech errors

Improving Domain-independent Cloud-Based Speech Recognition with Domain-Dependent Phonetic Post-Processing

On Grounding Natural Kind Terms in Human-Robot Communication

Experimental research of influence of acoustic noises of different types on results of automatic speech recognition