Frequency-Domain Blind Source Separation (FD-BSS) is an efficient way to analyze convolutive mixed speech. To improve the quality of the separated speech, a permutation algorithm based on Dynamic Time Warping (DTW) is proposed in this paper. Because signals in adjacent frequency bins have high similarity, DTW technology is used to compare them and generate adjustment matrices to solve the permutation ambiguity. Our approach is evaluated through simulated and practical experiments. Using Signal to Distortion Ratio (SDR), Signal to Interference Ratio (SIR), Signal to Artifacts Ratio (SAR), and Perceptual Estimation of the Speech Quality (PESQ) for measurements. To examine the quality of the separated speech in a practical acoustic environment, we adopt the accuracy ratio of Automatic Speech Recognition (ASR). In the experiments, we compare our approach with other classical permutation criteria such as K-L divergence distance, envelope correlation and higher-order statistics. The experimental results show that the proposed algorithm performs permutation alignment more accurately and improves the acoustic quality of separation.
Read full abstract