Abstract

The advanced speech synthetic techniques pose great threats to current speech-based biometric systems, which causes urgent demand for efficient countermeasures. However, existing detection features cannot generalize well for synthetic speeches created by unknown methods. Based on the analysis of different synthetic techniques, it is found that temporal long-term relation and high-frequency information are useful to capture more generalized and robust artifacts of synthetic speeches. To leverage the aforementioned clues, a novel time-frequency transform algorithm known as long-term variable Q transform (L-VQT) is proposed through time-frequency analysis. The core idea of L-VQT is that the variation of discrete frequencies is designed to follow the power function, which aims to perform long temporal windows at low frequencies through small frequency bandwidths. Besides, sufficient high-frequency information can be obtained by properly varying the exponent of the power function. Then, the log power spectrum feature based on L-VQT is sent to a modified Densely Connected Convolutional Network (DenseNet), referred to as Light DenseNet, which is used as a single system to obtain the detection result. Extensive experiments are conducted on ASVspoof 2019 logical access corpus. The proposed detection scheme achieves better generalization capability when compared with other state-of-the-art methods. It also retains a more robust performance in noisy conditions.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.