Abstract

With the development of deep neural networks, multi-channel speech separation techniques with fixed array geometries have achieved remarkable performance. However, distributed microphone array processing remains a challenging problem because it requires the network to be able to process inputs with varying dimensions. To address this problem, we propose a triple-path recurrent neural network (TPRNN) with multi-scale aggregation blocks for distributed microphone array multi-channel speech separation. First, we extend the single-channel dual-path recurrent neural network by additionally adding multi-scale aggregation blocks and adaptive feature fusion blocks. Next, a third path along the spatial dimension is introduced to model spatial information. By this means, TPRNN can alternately and iteratively perform inter-channel, intra-chunk, and inter-chunk modeling. The experimental results show that the proposed approach outperforms other advanced baselines for multi-channel speech separation and enhancement tasks using spatially distributed microphones.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.