On the use of blind channel response estimation and a residual neural network to detect physical access attacks to speaker verification systems

Anderson R Avila,Jahangir Alam,Fabiano O Costa Prado,Douglas O’Shaughnessy,Tiago H Falk

doi:10.1016/j.csl.2020.101163

Abstract

Spoofing attacks have been acknowledged as a serious threat to automatic speaker verification (ASV) systems. In this paper, we are specifically concerned with replay attack scenarios. As a countermeasure to the problem, we propose a front-end based on the blind estimation of the channel response magnitude and as a back-end a residual neural network. Our hypothesis is that the magnitude response of the channel, obtained by subtracting the log-magnitude spectrum of the observed signal from the prediction of the log-magnitude spectrum average of the observed signal’s clean counterpart, will capture the nuances of room ambiences, recordings and playback devices. The performance of these features is investigated on a benchmark back-end, based on a Gaussian mixture model and on a deep neural network classifier. Our experiments are performed on the 2017 and 2019 Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof) datasets. The benchmark systems are the same as used in the challenges and are based on constant-Q cepstral coefficients (CQCC) and linear-frequency cepstral coefficients (LFCC) features. Experimental results on the 2017 dataset show that the proposed method outperforms the two benchmarks, providing equal-error rates (EER) as low as 7.57% and 11.64%, respectively, for the development and evaluation sets. On the ASVspoof 2019 dataset, in turn, the proposed method outperformed the benchmark using a residual neural network as back-end by yielding tandem detection cost function (t-DCF) and EER as low as 0.1086 and 4.26% on the evaluation set. Lastly, an instrumental (objective) quality assessment is performed on the two datasets and the impact of quality variability on spoofing detection accuracy is discussed.

Full Text