Abstract

Commercial speaker verification systems are an important component in security services for various domains, such as law enforcement, government, and finance. These systems are sensitive to noise present in the input signal, which leads to inaccurate verification results and hence security breaches. Traditional speech enhancement (SE) methods have been employed to improve the performance of speaker verification systems. However, to the best of our knowledge, the impact of state-of-the-art speech enhancement techniques has not been analyzed for text-independent automatic speaker verification (ASV) systems using real-world utterances. In this work, our contribution is twofold. First, we propose two deep neural network (DNN) architectures for SE, and we compare the performance of the proposed networks with the existing work. We evaluate the resulting SE networks using the objective measures of perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI). Second, we analyze the performance of ASV systems when SE methods are used as front-end processing to remove the non-stationary background noise. We compare the resulting equal error rate (EER) using our DNN based SE approaches, as well as existing SE approaches, with real customer data and the freely available RedDots dataset. Our results show that our DNN based SE approaches provide benefits for speaker verification performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call