Voice Conversion Attacks Research Articles

Voice user interface (VUI) brings high efficiency and convenience for the applications of Internet of Things (IoTs), meanwhile, it can also cause increasingly serious security issues. The word-level voice liveness detection is proved to be the promising solution to thwart spoofing attacks. However, The complex acoustic feature, diversified attacks, and different interaction distance can severely affects the improvement of detection accuracy. To alleviate this issue, we develop a novel pop noise based word-level voice liveness detection framework. Firstly, a new voice frame selection method is proposed for determining optimal frames, including short time Fourier transform, low frequency average energy computation and sequencing. Then, the acoustic features of the selected frames are calculated by Gammatone frequency cepstral coefficient (GFCC). Finally, based on these features, a newly-built joint voice detector, fusing the self-attentional residual network (ResNet) and light gradient boosting machine (LightGBM), can achieve accurate voice classification. On the popular voice spoofing attack datasets, experimental results show that our proposal significantly outperforming the baseline and the state of the arts models, and it is gender dependent. Moreover, our proposal has good generalization ability for far-field replay voice attack, speech synthesis and voice conversion attacks, and partial fake voice attack. Finally, its effectiveness is verified by the ablation study.

Read full abstract

Voice conversion (VC) approach, which morphs the voice of a source speaker to be perceived as spoken by a specified target speaker, can be intentionally used to deceive the speaker identification (SID) and speaker verification (SV) systems that use speech biometric. Voice conversion spoofing attacks to imitate a particular speaker pose potential threat to these kinds of systems. In this paper, we first present an experimental study to evaluate the robustness of such systems against voice conversion disguise. We use Gaussian mixture model (GMM) based SID systems, GMM with universal background model (GMM-UBM) based SV systems and GMM supervector with support vector machine (GMM-SVM) based SV systems for this. Voice conversion is conducted by using three different techniques: GMM based VC technique, weighted frequency warping (WFW) based conversion method and its variation, where energy correction is disabled (WFW−). Evaluation is done by using intra-gender and cross-gender voice conversions between fifty male and fifty female speakers taken from TIMIT database. The result is indicated by degradation in the percentage of correct identification (POC) score in SID systems and degradation in equal error rate (EER) in all SV systems. Experimental results show that the GMM-SVM SV systems are more resilient against voice conversion spoofing attacks than GMM-UBM SV systems and all SID and SV systems are most vulnerable towards GMM based conversion than WFW and WFW− based conversion. From the results, it can also be said that, in general terms, all SID and SV systems are slightly more robust to voices converted through cross-gender conversion than intra-gender conversion. This work extended the study to find out the relationship between VC objective score and SV system performance in CMU ARCTIC database, which is a parallel corpus. The results of this experiment show an approach on quantifying objective score of voice conversion that can be related to the ability to spoof an SV system.

Read full abstract

Voice Conversion Attacks Research Articles

Related Topics

Articles published on Voice Conversion Attacks

A Self-Attentional ResNet-LightGBM Model for IoT-Enabled Voice Liveness Detection

Artificial speech detection using image-based features and random forest classifier

ASVspoof 2019: Spoofing Countermeasures for the Detection of Synthesized, Converted and Replayed Speech

A Kernel Density Estimation Based Loss Function and its Application to ASV-Spoofing Detection

On the study of replay and voice conversion attacks to text-dependent speaker verification

On robustness of speech based biometric systems against voice conversion attack

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Voice Conversion Attacks Research Articles

Related Topics

Articles published on Voice Conversion Attacks

A Self-Attentional ResNet-LightGBM Model for IoT-Enabled Voice Liveness Detection

Artificial speech detection using image-based features and random forest classifier

ASVspoof 2019: Spoofing Countermeasures for the Detection of Synthesized, Converted and Replayed Speech

A Kernel Density Estimation Based Loss Function and its Application to ASV-Spoofing Detection

On the study of replay and voice conversion attacks to text-dependent speaker verification

On robustness of speech based biometric systems against voice conversion attack