Group Communication With Context Codec for Lightweight Source Separation

Yi Luo,Nima Mesgarani,Cong Han

doi:10.1109/taslp.2021.3078640

Yi Luo, Nima Mesgarani + Show 1 more

Open Access

https://doi.org/10.1109/taslp.2021.3078640

Copy DOI

Abstract

Despite the recent progress on neural network architectures for speech separation, the balance between the model size, model complexity and model performance is still an important and challenging problem for the deployment of such models to low-resource platforms. In this paper, we propose two simple modules, group communication and context codec, that can be easily applied to a wide range of architectures to jointly decrease the model size and complexity without sacrificing the performance. A group communication module splits a high-dimensional feature into groups of low-dimensional features and captures the inter-group dependency. A separation module with a significantly smaller model size can then be shared by all the groups. A context codec module, containing a context encoder and a context decoder, is designed as a learnable downsampling and upsampling module to decrease the length of a sequential feature processed by the separation module. The combination of the group communication and the context codec modules is referred to as the GC3 design. Experimental results show that applying GC3 on multiple network architectures for speech separation can achieve on-par or better performance with as small as 2.5% model size and 17.6% model complexity, respectively.

Highlights

R ECENT developments in the neural network architectures have significantly advanced the state-of-the-art for source separation performance
We introduce a context codec module to help GroupComm maintain the performance while further decreasing the number of MAC operations, accelerating the training speed and alleviating the memory consumption in both training and inference time
A residual bidirectional longshort term memory (LSTM) (BLSTM) layer identical to the one used in the context codec was selected for the GroupComm module in [35], and we present the experimental results on the comparison between original dual-path RNN (DPRNN)-time-domain audio separation network (TasNet), the GroupComm-equipped DPRNN-TasNet, and the GC3-equipped DPRNN-TasNet

Summary

INTRODUCTION

R ECENT developments in the neural network architectures have significantly advanced the state-of-the-art for source separation performance. GroupComm splits a high-dimensional feature, such as a spectrum, into groups of low-dimensional features, such as subband spectra, and uses the same separation model across all the groups for weight sharing Another inter-group module is applied to capture the dependencies within the groups, so that the processing of each group always depends on the global information available. The low-dimensional features enable the use of a smaller module, e.g., a CNN or RNN layer, than the original high-dimensional feature, and together with weight sharing the total model size can be significantly reduced.

Standard Pipeline for Speech Separation

Group Communication

GroupComm With Context Codec

Discussions

Data Simulation

Model Configurations

Training Configurations

Evaluation Metrics

RESULTS AND ANALYSIS

Experimental Results on GC3-DPRNN

Effect of Model Architectures for GroupComm

Effect of Overlap Between Groups

CONCLUSION AND FUTURE WORKS

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM transactions on audio, speech, and language processing	Publication Date: Jan 1, 2021
Citations: 18	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Group Communication With Context Codec for Lightweight Source Separation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE/ACM transactions on audio, speech, and language processing

Lead the way for us

Similar Papers

Machine learning for modeling N2O emissions from wastewater treatment plants: Aligning model performance, complexity, and interpretability
Mostafa Khalil ... Peter A Vanrolleghem
Water research | VOL. 245
Mostafa Khalil, et. al.Mostafa Khalil ... Peter A Vanrolleghem
24 Sep 2023
Water research | VOL. 245

Debates: Does Information Theory Provide a New Paradigm for Earth Science? Sharper Predictions Using Occam's Digital Razor
Steven V Weijs ... Benjamin L Ruddell
Water Resources Research | VOL. 56
Steven V Weijs, et. al.Steven V Weijs ... Benjamin L Ruddell
01 Feb 2020
Water Resources Research | VOL. 56

Convolutional Neural Network for Behavioral Modeling and Predistortion of Wideband Power Amplifiers.
Xin Hu ... Biao Hu
IEEE transactions on neural networks | VOL. 33
Xin Hu, et. al.Xin Hu ... Biao Hu
10 Feb 2021
IEEE transactions on neural networks | VOL. 33

An evaluation of the impact of model structure on hydrological modelling uncertainty for streamflow simulation
Michael B Butts ... Henrik Madsen
Journal of hydrology | VOL. 298
Michael B Butts, et. al.Michael B Butts ... Henrik Madsen
05 Aug 2004
Journal of hydrology | VOL. 298

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Group Communication With Context Codec for Lightweight Source Separation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE/ACM transactions on audio, speech, and language processing