Abstract
Voice signals acquired by a microphone array often include considerable noise and mutual interference, seriously degrading the accuracy and speed of speech separation. Traditional beamforming is simple to implement, but its source interference suppression is not adequate. In contrast, independent component analysis (ICA) can improve separation, but imposes an iterative and time-consuming process to calculate the separation matrix. As a supporting method, principle component analysis (PCA) contributes to reduce the dimension, retrieve fast results, and disregard false sound sources. Considering the sparsity of frequency components in a mixed signal, we propose an adaptive fast speech separation algorithm based on multiple sound source localization as preprocessing to select between beamforming and frequency domain ICA according to different mixing conditions per frequency bin. First, a fast positioning algorithm allows calculating the maximum number of components per frequency bin of a mixed speech signal to prevent the occurrence of false sound sources. Then, PCA reduces the dimension to adaptively adjust the weight of beamforming and ICA for speech separation. Subsequently, the ICA separation matrix is initialized based on the sound source localization to notably reduce the iteration time and mitigate permutation ambiguity. Simulation and experimental results verify the effectiveness and speedup of the proposed algorithm.
Highlights
Speech separation aims at the effective extraction of target speech and removal of noise and interference
An adaptive fast speech separation algorithm based on multiple sound source localization as preprocessing to select between beamforming and frequency domain independent component analysis (ICA) according to different mixing conditions per frequency bin is proposed
We propose an adaptive and fast speech separation based on ICA and beamforming
Summary
Speech separation aims at the effective extraction of target speech and removal of noise and interference. Simple fixed beamforming has a low performance for speech separation in real environments, and due to the nonorthogonality of the steering vector [5] of each sound source, the interference suppression of adaptive beamforming depends on the accurate estimation of the propagation process [6] Alterations, such as the use of masks, can improve interference removal, but they can degrade the target signal components [7]. To solve the above-mentioned problems, and based on the sparsity of frequency components in a mixed signal, we propose an adaptive speech separation algorithm that consists of multiple sound source localization as preprocessing and selects either beamforming or frequency-domain ICA according to the frequency bin characteristics. The fourth section presents simulations and experiment, and conclusions are drawn in the fifth section
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.