Can We Use Speaker Recognition Technology to Attack Itself? Enhancing Mimicry Attacks Using Automatic Target Speaker Selection

Tomi Kinnunen,Rosa Gonzalez Hautamaki,Md Sahidullah,Ville Vestman

doi:10.1109/icassp.2019.8683811

Tomi Kinnunen, Rosa Gonzalez Hautamaki + Show 2 more

Open Access

https://doi.org/10.1109/icassp.2019.8683811

Copy DOI

Publication Date: Nov 28, 2018
Citations: 31	License type: pd

Affiliation: Finland University, University of Eastern Finland

Abstract

We consider technology-assisted mimicry attacks in the context of automatic speaker verification (ASV). We use ASV itself to select targeted speakers to be attacked by human-based mimicry. We recorded 6 naive mimics for whom we select target celebrities from VoxCeleb1 and VoxCeleb2 corpora (7,365 potential targets) using an i-vector system. The attacker attempts to mimic the selected target, with the utterances subjected to ASV tests using an independently developed x-vector system. Our main finding is negative: even if some of the attacker scores against the target speakers were slightly increased, our mimics did not succeed in spoofing the x-vector system. Interestingly, however, the relative ordering of the selected targets (closest, furthest, median) are consistent between the systems, which suggests some level of transferability between the systems.

Full Text