In the field of speech separation, the traditional single-channel and multi-channel speech separation methods have made great progress. However, the accuracy of separation and automatic speech recognition(ASR) rate are not yet satisfactory. With the development of neural networks, some scholars began to use deep learning to achieve speech separation. Although this kind of method improves the accuracy of speech separation, it also leads to the need for pre-training the model, higher computational complexity and reduced separation performance when the model does not match the mixed signal. This paper has conducted an in-depth study on the scene of multi-speaker separation, and proposed a new dual-channel speech separation algorithm based on the Comb-Filter Effect (CFE). The CFE is an effect that occurs when a signal passes through a first-order differential microphone(FDM) array. And this effect is discovered and exploited for the first time. By using this effect, this paper designed a new signal spectrum estimation method that can realize accurate estimation of speech signal, and combined this method with traditional spectral subtraction to achieve the purpose of speech separation. Finally, this paper compared the proposed algorithm with the traditional FastICA-based algorithm and the fully-convolutional time-domain audio separation network(Conv-TasNet)-based algorithm. The results of simulation and comparison experiments show that the algorithm can effectively separate two-way speech signals while greatly reducing the computational complexity and has excellent robustness. In various situations, the proposed algorithm can obtain the Scale-Invariant Source-to-Noise Ratio improvement (SI-SNRi) of 9.19 dB on average. In addition, the Short-Time Objective Intelligibility (STOI) and Perceptual Evaluation of Speech Quality (PESQ) of the speech signal can be improved by an average of 33% and 70% or more respectively.
Read full abstract