WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-End Speech Enhancement

Tsun-An Hsieh,Xugang Lu,Hsin-Min Wang,Yu Tsao

doi:10.1109/lsp.2020.3040693

Abstract

Due to the simple design pipeline, end-to-end (E2E) neural models for speech enhancement (SE) have attracted great interest. In order to improve the performance of the E2E model, the local and sequential properties of speech should be efficiently taken into account when modelling. However, in most current E2E models for SE, these properties are either not fully considered or are too complex to be realized. In this letter, we propose an efficient E2E SE model, termed WaveCRN. Compared with models based on convolutional neural networks (CNN) or long short-term memory (LSTM), WaveCRN uses a CNN module to capture the speech locality features and a stacked simple recurrent units (SRU) module to model the sequential property of the locality features. Different from conventional recurrent neural networks and LSTM, SRU can be efficiently parallelized in calculation, with even fewer model parameters. In order to more effectively suppress noise components in the noisy speech, we derive a novel restricted feature masking approach, which performs enhancement on the feature maps in the hidden layers; this is different from the approaches that apply the estimated ratio mask to the noisy spectral features, which is commonly used in speech separation methods. Experimental results on speech denoising and compressed speech restoration tasks confirm that with the SRU and the restricted feature map, WaveCRN performs comparably to other state-of-the-art approaches with notably reduced model complexity and inference time.

Highlights

Speech related applications, such as automatic speech recognition (ASR), voice communication, and assistive hearing devices, play an important role in modern society
We propose an E2E waveformmapping-based speech enhancement (SE) method using an alternative CRN, termed WaveCRN1, which combines the advantages of convolutional neural networks (CNN) and
We aim to show that simple recurrent units (SRU) is superior to long short-term memory (LSTM) in terms of the denoising capability and computational efficiency, when applied to waveform-based SE

Summary

Introduction

Speech related applications, such as automatic speech recognition (ASR), voice communication, and assistive hearing devices, play an important role in modern society. Most of these applications are not robust when noises are involved. A class of SE systems carry out enhancement on the frequency-domain acoustic features, which is generally called spectral-mapping-based SE approaches. In these approaches, speech signals are analyzed and reconstructed using the short-time Fourier transform (STFT) and inverse STFT, respectively [9]–[13]. The deep learning models, such as fully connected deep denoising auto-encoder [3], convolutional neural networks (CNNs) [14], and recurrent neural networks (RNNs) and long short-term memory (LSTM) [15], [16], are used as a transformation function to convert

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Signal Processing Letters	Publication Date: Jan 1, 2020
Citations: 88	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-End Speech Enhancement

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Signal Processing Letters

Lead the way for us

Similar Papers

A convolutional recurrent neural network with attention framework for speech separation in monaural recordings
Chao Sun ... Qin Yu
Scientific Reports | VOL. 11
Chao Sun, et. al.Chao Sun ... Qin Yu
14 Jan 2021
Scientific Reports | VOL. 11

Explicit Duration Recurrent Networks.
Shun-Zheng Yu
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33
Shun-Zheng YuShun-Zheng Yu
01 Jul 2022
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33

Network Anomaly Intrusion Detection Based on Deep Learning Approach.
Yung-Chung Wang ... Yi-Chun Houng
Sensors | VOL. 23
Yung-Chung Wang, et. al.Yung-Chung Wang ... Yi-Chun Houng
15 Feb 2023
Sensors | VOL. 23

An Investigation into the Detection of Human Scratching Activity Based on Deep Learning Models
Kevin Wang
-
Kevin WangKevin Wang
28 Apr 2023
28 Apr 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-End Speech Enhancement

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Signal Processing Letters