Speech separation is crucial for effective speech processing in multi-talker conditions, especially in real-time, low-latency applications. In this study, the Time-Domain Audio Separation Network (TasNet) and Dual-Path Recurrent Neural Network were used to perform a time-domain multiple-speaker speech separation challenge. One-dimensional conventional recurrent neural networks (RNNs) are not capable of accurately simulating long sequences. When their receptive length exceeds the sequence field, 1-D CNNs cannot recreate utterance-level sequences. Dual-Path Recurrent Neural Network (DPRNN) breaks up the lengthy sequential input that progressively performs intra- and inter-chunk operations with input lengths proportional to the square root of the beginning sequence length. Model outputs are more efficient than earlier systems, improving performance on the Libri Mix dataset. Investigations show that the DPRNN, sample-level modeling, and time-domain audio separation network can replace present methods. EEND-SS and other separation algorithms perform worse than DPRNN. The suggested model was able to achieve (12.376) SI-SDR, (0.969) STOI (short-time objective intelligibility), (12.363) SDR, (9.363) DER, and (97.193) SCA.
Read full abstract