Contrastive Learning based Speech Spoofing Detection for Multimedia Security in Edge Intelligence

Jiaqi Sun,Jong Hyuk Park,Yuanyuan He,Xianjun Deng,Yongling Huang,Xiaoxuan Fan,Celimuge Wu,Shenghao Liu

doi:10.1145/3698773

Abstract

Artificial intelligence (AI) empowered edge computing has given rise to a new paradigm and effectively facilitated the promotion and development of multimedia applications. The speech assistant is one of the significant services provided by multimedia applications, which aims to offer intelligent interactive experiences between humans and machines. However, malicious attackers may exploit spoofed speeches to deceive speech assistants, posing great challenges to the security of multimedia applications. The limited resources of multimedia terminal devices hinder their ability to effectively load speech spoofing detection models. Furthermore, processing and analyzing speech in the cloud can result in poor real-time performance and potential privacy risks. Existing speech spoofing detection methods rely heavily on annotated data and exhibit poor generalization capabilities for unseen spoofed speeches. To address these challenges, this paper first proposes the Coordinate Attention Network (CA2Net) that consists of coordinate attention blocks and Res2Net blocks. CA2Net can simultaneously extract temporal and spectral speech feature information and represent multi-scale speech features at a granularity level. Besides, a contrastive learning-based speech spoofing detection framework named GEMINI is proposed. GEMINI can be effectively deployed on edge nodes and autonomously learn speech features with strong generalization capabilities. GEMINI first performs data augmentation on speech signals and extracts conventional acoustic features to enhance the feature robustness. Subsequently, GEMINI utilizes the proposed CA2Net to further explore the discriminative speech features. Then, a tensor-based multi-attention comparison model is employed to maximize the consistency between speech contexts. GEMINI continuously updates CA2Net with contrastive learning, which enables CA2Net to effectively represent speech signals and accurately detect spoofed speeches. Extensive experiments on the ASVspoof2019 dataset show that GEMINI reduces the Equal Error Rate and tandem Detection Cost Function by up to 96.75% and 96.35% in the physical access scenario, and by up to 86.62% and 87.71% in the logical access scenario compared to peer methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Contrastive Learning based Speech Spoofing Detection for Multimedia Security in Edge Intelligence

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications

Lead the way for us

Similar Papers

Population responses in primary auditory cortex simultaneously represent the temporal envelope and periodicity features in natural speech
Daniel A Abrams ... Nina Kraus
Hearing Research | VOL. 348
Daniel A Abrams, et. al.Daniel A Abrams ... Nina Kraus
17 Feb 2017
Hearing Research | VOL. 348

Performance Analysis of Deep Learning Based Speech Quality Model with Mixture of Features
Rahul Jaiswal
-
Rahul JaiswalRahul Jaiswal
01 Dec 2022
01 Dec 2022

Improved Voice Activity Detection based on support vector machine with high separable speech feature vectors
Y. X. Zou ... Wei Shi
-
Y. X. Zou, et. al.Y. X. Zou ... Wei Shi
01 Aug 2014
01 Aug 2014

An improved noise-robust voice activity detector based on hidden semi-Markov models
Yuan Liang ... Baosong Shan
Pattern Recognition Letters | VOL. 32
Yuan Liang, et. al.Yuan Liang ... Baosong Shan
21 Feb 2011
Pattern Recognition Letters | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Contrastive Learning based Speech Spoofing Detection for Multimedia Security in Edge Intelligence

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications