This paper presents a new blind speech separation algorithm using beamforming technique that is capable of extracting each individual speech signal from a mixture of three speech sources in a room. The speech separation algorithm utilizes the steered response power phase transform for obtaining a localization estimate for each individual speech source in the frequency domain. Based on those estimates each desired speech signal is extracted from the speech mixture using an optimal beamforming technique. To solve the permutation problem, a permutation alignment algorithm based on the mutual output correlation is employed to group the output signals into the correct sources from each frequency bin. Evaluations using real speech recordings in a room environment show that the proposed blind speech separation algorithm offers high interference suppression level whilst maintaining low distortion level for each desired signal.
Read full abstract