Abstract

Technological progress and proliferation of sophisticated software has made it easier than ever to spoof a person’s voice and audio in general. Like other biometrics, speaker verification is vulnerable to spoofing attacks. Detecting these attacks using the artifacts present in the recordings is a major challenge. Current trend in spoofing detection is to employ deep learning architectures to perform end-to-end detection by employing a pooling layer which aggregates the frame-level information into utterance-level embeddings. To do so, only the first or first and second order statistics are normally pooled across temporal dimension. In this paper, we investigate the influence of higher order statistics, such as third and fourth order moments, on spoofing detection performance. A Time Delay Neural Network (TDNN) architecture is used on the top of linear frequency cepstral coefficients for carrying out spoofing detection experiments on the ASVspoof2019 challenge logical access and physical access corpora. Experiments results, in terms of equal error rate (EER) and minimum tandem detection cost function (min-tDCF), show that inclusion of higher order statistics is accommodating for improving the performance of spoofing detection systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.