Separation Front-end Research Articles

Hearing-impaired people often struggle to follow the speech stream of an individual talker in noisy environments. Recent studies show that the brain tracks attended speech and that the attended talker can be decoded from neural data on a single-trial level. This raises the possibility of “neuro-steered” hearing devices in which the brain-decoded intention of a hearing-impaired listener is used to enhance the voice of the attended speaker from a speech separation front-end. So far, methods that use this paradigm have focused on optimizing the brain decoding and the acoustic speech separation independently. In this work, we propose a novel framework called brain-informed speech separation (BISS)11BISS: brain-informed speech separation. in which the information about the attended speech, as decoded from the subject’s brain, is directly used to perform speech separation in the front-end. We present a deep learning model that uses neural data to extract the clean audio signal that a listener is attending to from a multi-talker speech mixture. We show that the framework can be applied successfully to the decoded output from either invasive intracranial electroencephalography (iEEG) or non-invasive electroencephalography (EEG) recordings from hearing-impaired subjects. It also results in improved speech separation, even in scenes with background noise. The generalization capability of the system renders it a perfect candidate for neuro-steered hearing-assistive devices.

Read full abstract

Robustness against noise and reverberation is critical for ASR systems deployed in real-world environments. In robust ASR, corrupted speech is normally enhanced using speech separation or enhancement algorithms before recognition. This paper presents a novel joint training framework for speech separation and recognition. The key idea is to concatenate a deep neural network (DNN) based speech separation frontend and a DNN-based acoustic model to build a larger neural network, and jointly adjust the weights in each module. This way, the separation fron-tend is able to provide enhanced speech desired by the acoustic model and the acoustic model can guide the separation frontend to produce more discriminative enhancement. In addition, we apply sequence training to the jointly trained DNN so that the linguistic information contained in the acoustic and language models can be back-propagated to influence the separation frontend at the training stage. To further improve the robustness, we add more noise- and reverberation-robust features for acoustic modeling. At the test stage, utterance-level unsupervised adaptation is performed to adapt the jointly trained network by learning a linear transformation of the input of the separation frontend. The resulting sequence-discriminative jointly-trained multistream system with run-time adaptation achieves 10.63% average word error rate (WER) on the test set of the reverberant and noisy CHiME-2 dataset (task-2), which represents the best performance on this dataset and a 22.75% error reduction over the best existing method.

Read full abstract

Separation Front-end Research Articles

Articles published on Separation Front-end

Brain-informed speech separation (BISS) for enhancement of target speaker in multitalker speech perception

Correlator beamforming for multipath mitigation in high‐fidelity GNSS monitoring applications

Building APMv3 Map Visualization Using Nagios Host Data

A Joint Training Framework for Robust Automatic Speech Recognition

Exploring rapid radiochemical separations at the University of Tennessee Radiochemistry Center of Excellence

A separation-based UI architecture with a DSL for role specialization

Investigation of Speech Separation as a Front-End for Noise Robust Speech Recognition

Estimating Uncertainty to Improve Exemplar-Based Feature Enhancement for Noise Robust Speech Recognition

Real-time 6-DOF multi-session visual SLAM over large-scale environments

A New Evidence Model for Missing Data Speech Recognition With Applications in Reverberant Multi-Source Environments

Quantitative Analysis of a Common Audio Similarity Measure

Problems in expediting protocol processing: performance analysis of kernel mode priority scheduling and front end protocol processors

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Separation Front-end Research Articles

Articles published on Separation Front-end

Brain-informed speech separation (BISS) for enhancement of target speaker in multitalker speech perception

Correlator beamforming for multipath mitigation in high‐fidelity GNSS monitoring applications

Building APMv3 Map Visualization Using Nagios Host Data

A Joint Training Framework for Robust Automatic Speech Recognition

Exploring rapid radiochemical separations at the University of Tennessee Radiochemistry Center of Excellence

A separation-based UI architecture with a DSL for role specialization

Investigation of Speech Separation as a Front-End for Noise Robust Speech Recognition

Estimating Uncertainty to Improve Exemplar-Based Feature Enhancement for Noise Robust Speech Recognition

Real-time 6-DOF multi-session visual SLAM over large-scale environments

A New Evidence Model for Missing Data Speech Recognition With Applications in Reverberant Multi-Source Environments

Quantitative Analysis of a Common Audio Similarity Measure

Problems in expediting protocol processing: performance analysis of kernel mode priority scheduling and front end protocol processors