Automatic Speech Recognition Experiments Research Articles

The Speech Accessibility Project (SAP) intends to facilitate research and development in automatic speech recognition (ASR) and other machine learning tasks for people with speech disabilities. The purpose of this article is to introduce this project as a resource for researchers, including baseline analysis of the first released data package. The project aims to facilitate ASR research by collecting, curating, and distributing transcribed U.S. English speech from people with speech and/or language disabilities. Participants record speech from their place of residence by connecting their personal computer, cell phone, and assistive devices, if needed, to the SAP web portal. All samples are manually transcribed, and 30 per participant are annotated using differential diagnostic pattern dimensions. For purposes of ASR experiments, the participants have been randomly assigned to a training set, a development set for controlled testing of a trained ASR, and a test set to evaluate ASR error rate. The SAP 2023-10-05 Data Package contains the speech of 211 people with dysarthria as a correlate of Parkinson's disease, and the associated test set contains 42 additional speakers. A baseline ASR, with a word error rate of 3.4% for typical speakers, transcribes test speech with a word error rate of 36.3%. Fine-tuning reduces the word error rate to 23.7%. Preliminary findings suggest that a large corpus of dysarthric and dysphonic speech has the potential to significantly improve speech technology for people with disabilities. By providing these data to researchers, the SAP intends to significantly accelerate research into accessible speech technology. https://doi.org/10.23641/asha.27078079.

Read full abstract

Extensive use of Intelligent Personal Assistants (IPA) and biometrics in our day-to-day life asks for privacy preservation while dealing with personal data. To that effect, efforts have been made to preserve the personally identifiable characteristics from human voice using different speaker anonymization techniques. In this paper, we propose Cycle Consistent Generative Adversarial Network (CycleGAN) to modify (transform) the speaker’s gender as well as the other prosodic aspects using their Mel cepstral coefficients (MCEPs) and fundamental frequency (i.e., F0). For effective anonymization in the context of voice privacy, we propose two-level (i.e., double) anonymization, where first-level anonymization is done using CycleGAN, followed by second-level anonymization using time-scale modification. The speaker anonymization and intelligibility are measured objectively using the automatic speaker verification (ASV) and automatic speech recognition (ASR) experiments, respectively, on development and test sets of Librispeech and VCTK datasets. For CycleGAN-based anonymization, the average % EERs (% WERs) are 40.3% (8.89%) and 40.95% (9.37%) with original enrollments and anonymized trials of the development and test datasets, respectively. The average % EERs (% WERs) for double anonymization are 46.19% (9.95%) and 44.76% (10.34%) with original enrollments and anonymized trials of the development and test datasets, respectively. For the voice privacy evaluation , the performance of ASV system is much important, when the enrollments and trials both are anonymized (called as A-A case), which is also briefly discussed in this work. The average % EERs for A-A case (test set) are 24.29% and 2.81% using CycleGAN-based anonymization and double anonymization, respectively. Objective evaluation for more advanced attack model (i.e., attacker having anonymized data) is also explored in this study. The performance reflected the robustness of proposed anonymization approach towards voice privacy. The subjective tests using 101 listeners and corresponding analysis of variance (ANOVA) and Tukey–Kramer-based Ad-hoc tests are also carried out in order to quote statistical significance of our results. The subjective test show that the CycleGAN and double anonymization approaches give better naturalness, intelligibility, and speaker dissimilarity than the state-of-the-art x-vector-based baseline system.

Read full abstract

Automatic Speech Recognition Experiments Research Articles

Related Topics

Articles published on Automatic Speech Recognition Experiments

Community-Supported Shared Infrastructure in Support of Speech Accessibility.

A Bilingual Basque–Spanish Dataset of Parliamentary Sessions for the Development and Evaluation of Speech Technology

Voice privacy using CycleGAN and time-scale modification

A distributed optimisation framework combining natural gradient with Hessian-free for discriminative sequence training

Gated Recurrent Context: Softmax-Free Attention for Online Encoder-Decoder Speech Recognition

A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement

FEATURE EXTRACTION ALGORITHM USING NEW CEPSTRAL TECHNIQUES FOR ROBUST SPEECH RECOGNITION

Building and evaluation of a real room impulse response dataset

Exploiting alternative acoustic sensors for improved noise robustness in speech communication

Unsupervised modulation filter learning for noise-robust speech recognition

Maximum-<italic>a-Posteriori</italic>-Based Decoding for End-to-End Acoustic Models

Integration of Optimized Modulation Filter Sets Into Deep Neural Networks for Automatic Speech Recognition

Multi-channel non-negative matrix factorization with binary mask initialization for automatic speech recognition

Deriving disyllabic word variants from a Chinese conversational speech corpus.

Syntactic and Semantic Features For Code-Switching Factored Language Models

Weighted finite-state transducer-based dysarthric speech recognition error correction using context-dependent pronunciation variation modelling

General framework for mining, processing and storing large amounts of electronic texts for language modeling purposes

The MoveOn database: motorcycle environment speech and noise database for command and control applications

Improving automatic speech recognition by learning from human errors

Robust speech recognition using spatial–temporal feature distribution characteristics

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Automatic Speech Recognition Experiments Research Articles

Related Topics

Articles published on Automatic Speech Recognition Experiments

Community-Supported Shared Infrastructure in Support of Speech Accessibility.

A Bilingual Basque–Spanish Dataset of Parliamentary Sessions for the Development and Evaluation of Speech Technology

Voice privacy using CycleGAN and time-scale modification

A distributed optimisation framework combining natural gradient with Hessian-free for discriminative sequence training

Gated Recurrent Context: Softmax-Free Attention for Online Encoder-Decoder Speech Recognition

A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement

FEATURE EXTRACTION ALGORITHM USING NEW CEPSTRAL TECHNIQUES FOR ROBUST SPEECH RECOGNITION

Building and evaluation of a real room impulse response dataset

Exploiting alternative acoustic sensors for improved noise robustness in speech communication

Unsupervised modulation filter learning for noise-robust speech recognition

Maximum-&lt;italic&gt;a-Posteriori&lt;/italic&gt;-Based Decoding for End-to-End Acoustic Models

Integration of Optimized Modulation Filter Sets Into Deep Neural Networks for Automatic Speech Recognition

Multi-channel non-negative matrix factorization with binary mask initialization for automatic speech recognition

Deriving disyllabic word variants from a Chinese conversational speech corpus.

Syntactic and Semantic Features For Code-Switching Factored Language Models

Weighted finite-state transducer-based dysarthric speech recognition error correction using context-dependent pronunciation variation modelling

General framework for mining, processing and storing large amounts of electronic texts for language modeling purposes

The MoveOn database: motorcycle environment speech and noise database for command and control applications

Improving automatic speech recognition by learning from human errors

Robust speech recognition using spatial–temporal feature distribution characteristics

Maximum-<italic>a-Posteriori</italic>-Based Decoding for End-to-End Acoustic Models