Discriminative Frequency Information Learning for End-to-End Speech Anti-Spoofing

Bingyuan Huang,Sanshuai Cui,Xiangui Kang,Jiwu Huang

doi:10.1109/lsp.2023.3251895

Abstract

End-to-end technology is an active research topic in speech anti-spoofing. Although end-to-end methods have achieved remarkable success in the speech anti-spoofing, channel effects brought by telephony transmission and certain challenging forms of spoofing attacks still plague them. We observe that differences in the high-frequency components between bonafide and spoofed speech help detect some most troublesome attack forms and the differences also remain after the signals are affected by transmission and codecs. Based on this observation, we aim to utilize the high-frequency information of speech signals to develop better generalization ability to unknown attacks and stronger robustness against transmission and codecs. We propose a raw waveform processing module based on sinc convolution and multiple pre-emphasis to obtain discriminative shallow feature representations. Additionally, we propose an improved backbone to learn discriminative feature embeddings, and a feature classification loss to optimize intra-class and inter-class distances simultaneously. The above modules constitute the proposed Discriminative Frequency-information SincNet, namely DFSincNet. Our proposed algorithm demonstrates competitive performance on both ASVspoof 2019 and 2021 logical access (LA) scenarios.

Full Text