Automatic Speech Recognition Performance Research Articles

We propose an integrated end-to-end automatic speech recognition (ASR) paradigm by joint learning of the front-end speech signal processing and back-end acoustic modeling. We believe that “only good signal processing can lead to top ASR performance” in challenging acoustic environments. This notion leads to a unified deep neural network (DNN) framework for distant speech processing that can achieve both high-quality enhanced speech and high-accuracy ASR simultaneously. Our goal is accomplished by two techniques, namely: (i) a reverberation-time-aware DNN based speech dereverberation architecture that can handle a wide range of reverberation times to enhance speech quality of reverberant and noisy speech, followed by (ii) DNN-based multicondition training that takes both clean-condition and multicondition speech into consideration, leveraging upon an exploitation of the data acquired and processed with multichannel microphone arrays, to improve ASR performance. The final end-to-end system is established by a joint optimization of the speech enhancement and recognition DNNs. The recent REverberant Voice Enhancement and Recognition Benchmark (REVERB) Challenge task is used as a test bed for evaluating our proposed framework. We first report on superior objective measures in enhanced speech to those listed in the 2014 REVERB Challenge Workshop on the simulated data test set. Moreover, we obtain the best single-system word error rate (WER) of 13.28% on the 1-channel REVERB simulated data with the proposed DNN-based pre-processing algorithm and clean-condition training. Leveraging upon joint training with more discriminative ASR features and improved neural network based language models, a low single-system WER of 4.46% is attained. Next, a new multi-channel-condition joint learning and testing scheme delivers a state-of-the-art WER of 3.76% on the 8-channel simulated data with a single ASR system. Finally, we also report on a preliminary yet promising experimentation with the REVERB real test data.

[abstFig src='/00290001/11.jpg' width='300' text='Ego-noise suppression achieves speech recognition even during motion' ] This paper addresses ego-motion noise suppression for a robot. Many ego-motion noise suppression methods use motion information such as position, velocity, and the acceleration of each joint to infer ego-motion noise. However, such inferences are not reliable, since motion information and ego-motion noise are not always correlated. We propose a new framework for ego-motion noise suppression based on single channel processing using only acoustic signals captured with a microphone. In the proposed framework, ego-motion noise features and their numbers are automatically estimated in advance from an ego-motion noise input using Infinite Non-negative Matrix Factorization (INMF), which is a non-parametric Bayesian model that does not use explicit motion information. After that, the proposed Semi-Blind INMF (SB-INMF) is applied to an input signal that consists of both the target and ego-motion noise signals. Ego-motion noise features, which are obtained with INMF, are used as inputs to the SB-INMF, and are treated as the fixed features for extracting the target signal. Finally, the target signal is extracted with SB-INMF using these newly-estimated features. The proposed framework was applied to ego-motion noise suppression on two types of humanoid robots. Experimental results showed that ego-motion noise was effectively and efficiently suppressed in terms of both signal-to-noise ratio and performance of automatic speech recognition compared to a conventional template-based ego-motion noise suppression method using motion information. Thus, the proposed method worked properly on a robot without a motion information interface.**This work is an extension of our publication “Taiki Tezuka, Takami Yoshida, Kazuhiro Nakadai: Ego-motion noise suppression for robots based on Semi-Blind Infinite Non-negative Matrix Factorization, ICRA 2014, pp.6293-6298, 2014.”

Automatic Speech Recognition Performance Research Articles

Related Topics

Articles published on Automatic Speech Recognition Performance

Retrospective Analysis of Clinical Performance of an Estonian Speech Recognition System for Radiology: Effects of Different Acoustic and Language Models

Comparing Fusion Models for DNN-Based Audiovisual Continuous Speech Recognition

Continuous Punjabi speech recognition model based on Kaldi ASR toolkit

Adaptive neuro-fuzzy inference system for evaluating dysarthric automatic speech recognition (ASR) systems: a case study on MVML-based ASR

DNN Uncertainty Propagation Using GMM-Derived Uncertainty Features for Noise Robust ASR

Multi-Dialectical Languages Effect on Speech Recognition: Too Much Choice Can Hurt

An End-to-End Deep Learning Approach to Simultaneous Speech Dereverberation and Acoustic Modeling for Robust Speech Recognition

Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse Filtering

Auxiliary Features from Laser-Doppler Vibrometer Sensor for Deep Neural Network Based Robust Speech Recognition

Automatic Speech Recognition Predicts Speech Intelligibility and Comprehension for Listeners With Simulated Age-Related Hearing Loss.

The impact of phonological rules on Arabic speech recognition

Enhanced Running Spectrum Analysis for Robust Speech Recognition Under Adverse Conditions: A Case Study on Japanese Speech

Electro-Tactile Stimulation Enhances Cochlear Implant Speech Recognition in Noise

A generic neural acoustic beamforming architecture for robust multi-channel speech processing

Ego-Noise Suppression for Robots Based on Semi-Blind Infinite Non-Negative Matrix Factorization

Bayesian feature enhancement using independent vector analysis and reverberation parameter re-estimation for noisy reverberant speech recognition

Dual‐channel VTS feature compensation for noise‐robust speech recognition on mobile devices

Phone Synchronous Speech Recognition With CTC Lattices

An analysis of environment, microphone and data simulation mismatches in robust speech recognition

Integration of Optimized Modulation Filter Sets Into Deep Neural Networks for Automatic Speech Recognition

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Automatic Speech Recognition Performance Research Articles

Related Topics

Articles published on Automatic Speech Recognition Performance

Retrospective Analysis of Clinical Performance of an Estonian Speech Recognition System for Radiology: Effects of Different Acoustic and Language Models

Comparing Fusion Models for DNN-Based Audiovisual Continuous Speech Recognition

Continuous Punjabi speech recognition model based on Kaldi ASR toolkit

Adaptive neuro-fuzzy inference system for evaluating dysarthric automatic speech recognition (ASR) systems: a case study on MVML-based ASR

DNN Uncertainty Propagation Using GMM-Derived Uncertainty Features for Noise Robust ASR

Multi-Dialectical Languages Effect on Speech Recognition: Too Much Choice Can Hurt

An End-to-End Deep Learning Approach to Simultaneous Speech Dereverberation and Acoustic Modeling for Robust Speech Recognition

Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse Filtering

Auxiliary Features from Laser-Doppler Vibrometer Sensor for Deep Neural Network Based Robust Speech Recognition

Automatic Speech Recognition Predicts Speech Intelligibility and Comprehension for Listeners With Simulated Age-Related Hearing Loss.

The impact of phonological rules on Arabic speech recognition

Enhanced Running Spectrum Analysis for Robust Speech Recognition Under Adverse Conditions: A Case Study on Japanese Speech

Electro-Tactile Stimulation Enhances Cochlear Implant Speech Recognition in Noise

A generic neural acoustic beamforming architecture for robust multi-channel speech processing

Ego-Noise Suppression for Robots Based on Semi-Blind Infinite Non-Negative Matrix Factorization

Bayesian feature enhancement using independent vector analysis and reverberation parameter re-estimation for noisy reverberant speech recognition

Dual‐channel VTS feature compensation for noise‐robust speech recognition on mobile devices

Phone Synchronous Speech Recognition With CTC Lattices

An analysis of environment, microphone and data simulation mismatches in robust speech recognition

Integration of Optimized Modulation Filter Sets Into Deep Neural Networks for Automatic Speech Recognition