LaSNet: An end-to-end network based on steering vector filter for sound source localization and separation

Xiaokang Yang,Hongcheng Zhang,Yufei Lu,Ying A,Guangyi Ren,Jianguo Wei,Xianliang Wang,Wei Li

doi:10.1016/j.apacoust.2023.109562

Xiaokang Yang, Hongcheng Zhang + Show 6 more

https://doi.org/10.1016/j.apacoust.2023.109562

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

In this paper, we propose a novel time-domain end-to-end network (LaSNet) for solving the problem of multiple sound sources localization (SSL) and separation based on microphone array. The traditional time-frequency (T-F) signal representation is subject to various prior conditions and fails to separate the different sound signal components. Even the data-driven neural network does not develop an effectively integrated approach where localization and separation interplay to serve both challenge. To address the aforementioned issue, we propose a novel approach that involves the implementation of a Separation Driving Localization Network (SDLNet). This framework operates by extracting latent features from a separation network and subsequently employing them in the context of a localization network. Then we propose a simple multi-task network for both SSL and separation. Through the analysis of steering vector filter, we find that the localization and separation problems can be linked by the operation of pseudo-inverse (pinv). To facilitate a synergistic relationship between SSL and sound separation, while also enabling end-to-end network training, we develop a Pinv Module (PM). Fianlly, the Localization and Separation Network (LaSNet) structure of this paper is proposed. Inspired by the overlay mechanism of network, LasNet is extended to a multi-task and multi-layer network, in which separation task is divided into multiple subtasks. A fuzzy separation loss function is introduced for training multi-layer network. Numerical experiments demonstrate that the proposed method has a clearly better advantageous improvement than several well known models. LaSNet has greatly performance improvement in both separation and localization, and achieves at least 32% relative reduction in model size, compared with the baseline models.

Full Text