Distributed Microphones Speech Separation by Learning Spatial Information With Recurrent Neural Network

Xiaoxiao Xiang,Wupeng Xie,Xiaojuan Zhang

doi:10.1109/lsp.2022.3188178

Abstract

With the development of deep neural networks, multi-channel speech separation techniques with fixed array geometries have achieved remarkable performance. However, distributed microphone array processing remains a challenging problem because it requires the network to be able to process inputs with varying dimensions. To address this problem, we propose a triple-path recurrent neural network (TPRNN) with multi-scale aggregation blocks for distributed microphone array multi-channel speech separation. First, we extend the single-channel dual-path recurrent neural network by additionally adding multi-scale aggregation blocks and adaptive feature fusion blocks. Next, a third path along the spatial dimension is introduced to model spatial information. By this means, TPRNN can alternately and iteratively perform inter-channel, intra-chunk, and inter-chunk modeling. The experimental results show that the proposed approach outperforms other advanced baselines for multi-channel speech separation and enhancement tasks using spatially distributed microphones.

Full Text