Learning Efficient Representations for Fake Speech Detection

Nishant Subramani,Delip Rao

doi:10.1609/aaai.v34i04.6044

Abstract

Synthetic speech or “fake speech” which matches personal vocal traits has become better and cheaper due to advances in deep learning-based speech synthesis and voice conversion approaches. This increased accessibility of synthetic speech systems and the growing misuse of them highlights the critical need to build countermeasures. Furthermore, new synthesis models evolve all the time and the efficacy of previously trained detection models on these unseen attack vectors is poor. In this paper, we focus on: 1) How can we build highly accurate, yet parameter and sample-efficient models for fake speech detection? 2) How can we rapidly adapt detection models to new sources of fake speech? We present four parameter-efficient convolutional architectures for fake speech detection with best detection F1 scores of around 97 points on a large dataset of fake and bonafide speech. We show how the fake speech detection task naturally lends itself to a novel multi-task problem further improving F1 scores for a mere 0.5% increase in model parameters. Our multi-task setting also helps in data-sparse situations, commonplace in adversarial settings. We investigate an alternative approach to the data-sparsity problem using transfer learning and show that it is possible to meet purely supervised detection performance for unseen attack vectors with as little as 6.25% of the training data. This is the first known application of transfer learning in adversarial settings for speech. Finally, we show how well our transfer learning approach adapts in an instance-efficient way to new attack vectors using the Real-Time Voice Cloning toolkit. We exceed the purely supervised detection performance (99.18 F1) with as little as 6.25% of the data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning Efficient Representations for Fake Speech Detection

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Apr 3, 2020
Citations: 18

Similar Papers

Cross-lingual Transfer Learning and Multitask Learning for Capturing Multiword Expressions
Shiva Taslimipoor ... Omid Rohanian
-
Shiva Taslimipoor, et. al.Shiva Taslimipoor ... Omid Rohanian
01 Jan 2019
01 Jan 2019

A Comprehensive Survey on Transfer Learning
Fuzhen Zhuang ... Zhiyuan Qi
Proceedings of the IEEE | VOL. 109
Fuzhen Zhuang, et. al.Fuzhen Zhuang ... Zhiyuan Qi
16 Jul 2020
Proceedings of the IEEE | VOL. 109

A Sample Size Extractor for RCT Reports.
Fengyang Lin ... Hao Liu
Studies in health technology and informatics | VOL. 290
Fengyang Lin, et. al.Fengyang Lin ... Hao Liu
06 Jun 2022
Studies in health technology and informatics | VOL. 290

Polyphonic Sound Event Detection Using Convolutional Bidirectional Lstm and Synthetic Data-based Transfer Learning
Seokwon Jung ... Jungbae Park
-
Seokwon Jung, et. al.Seokwon Jung ... Jungbae Park
01 May 2019
01 May 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning Efficient Representations for Fake Speech Detection

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence