Abstract

End-to-end technology is an active research topic in speech anti-spoofing. Although end-to-end methods have achieved remarkable success in the speech anti-spoofing, channel effects brought by telephony transmission and certain challenging forms of spoofing attacks still plague them. We observe that differences in the high-frequency components between bonafide and spoofed speech help detect some most troublesome attack forms and the differences also remain after the signals are affected by transmission and codecs. Based on this observation, we aim to utilize the high-frequency information of speech signals to develop better generalization ability to unknown attacks and stronger robustness against transmission and codecs. We propose a raw waveform processing module based on sinc convolution and multiple pre-emphasis to obtain discriminative shallow feature representations. Additionally, we propose an improved backbone to learn discriminative feature embeddings, and a feature classification loss to optimize intra-class and inter-class distances simultaneously. The above modules constitute the proposed Discriminative Frequency-information SincNet, namely DFSincNet. Our proposed algorithm demonstrates competitive performance on both ASVspoof 2019 and 2021 logical access (LA) scenarios.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.