To tackle the challenging task of extracting target signal features amidst significant interference from complex, time-varying non-stationary noise in industrial scenarios, we propose a novel time-domain all-neural beamforming network tailored for non-stationary mechanical sound source separation, named TFANBNet. Leveraging time–frequency convolution and self-attention, TFANBNet promised robust performance in challenging acoustic environments. First, the proposed time–frequency convolution module employs parameterized complex-valued convolutional layers to simulate time–frequency transformations, thereby implicitly extracting time–frequency features. Then, the proposed adaptive attention transformer network enhances local attention interaction by integrating intermediate features, effectively enhancing long-range context modeling. Third, the beamforming module utilizes deep neural networks for beamforming weight estimation, replacing conventional noise covariance matrix computations, which have inherent performance limitations. Finally, the experimental results on the multi-channel spatial reverberation dataset synthesized from the MIMII dataset show that the proposed TFANBNet yields superior separation performance and generalization capabilities compared to the considered competitive methods.
Read full abstract