Abstract
In this paper, we propose a novel time-domain end-to-end network (LaSNet) for solving the problem of multiple sound sources localization (SSL) and separation based on microphone array. The traditional time-frequency (T-F) signal representation is subject to various prior conditions and fails to separate the different sound signal components. Even the data-driven neural network does not develop an effectively integrated approach where localization and separation interplay to serve both challenge. To address the aforementioned issue, we propose a novel approach that involves the implementation of a Separation Driving Localization Network (SDLNet). This framework operates by extracting latent features from a separation network and subsequently employing them in the context of a localization network. Then we propose a simple multi-task network for both SSL and separation. Through the analysis of steering vector filter, we find that the localization and separation problems can be linked by the operation of pseudo-inverse (pinv). To facilitate a synergistic relationship between SSL and sound separation, while also enabling end-to-end network training, we develop a Pinv Module (PM). Fianlly, the Localization and Separation Network (LaSNet) structure of this paper is proposed. Inspired by the overlay mechanism of network, LasNet is extended to a multi-task and multi-layer network, in which separation task is divided into multiple subtasks. A fuzzy separation loss function is introduced for training multi-layer network. Numerical experiments demonstrate that the proposed method has a clearly better advantageous improvement than several well known models. LaSNet has greatly performance improvement in both separation and localization, and achieves at least 32% relative reduction in model size, compared with the baseline models.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have