Speaker Verification Based on Single Channel Speech Separation

Rong Jin,Mijit Ablimit,Askar Hamdulla

doi:10.1109/access.2023.3287868

Abstract

In multi-speaker scenarios, speech processing tasks like speaker identification and speech recognition are susceptible to noise and overlapped voices. As the overlapped voices are a complicated mixture of signals, a target extraction method from this mixture is a good front end solution for further processing like understanding and classifying. The quality of speech separation can be assessed by the noise ratio or subjective scoring and can also be assessed by accuracy of the downstream processing tasks like speaker identification. In order to make the separation model and speaker identification model more adapted to complex multi-speaker speech overlapping scenarios, this research investigates the speech separation model and incorporate with a voiceprint recognition task. This paper proposes a feature-scale single channel speech separation network connected to a back end speaker verification network with MFCCT feature, so the accuracy of speaker identification indicates the quality of speech separation task. The datasets are prepared by synthesizing Voxceleb1 data, and used for training and testing. The results show that using an objective downstream evaluation can effectively improve the overall performance, as the optimized speech separation model significantly reduced the error rate of speaker verification.

Full Text