Amplitude and Frequency Modulation-based features for detection of replay Spoof Speech

Madhu R Kamble,Hemlata Tak,Hemant A Patil

doi:10.1016/j.specom.2020.10.003

Abstract

Replay attack poses a great threat to the Automatic Speaker Verification (ASV) system. This paper introduces Amplitude Modulation and Frequency Modulation-based features for replay Spoof Speech Detection (SSD) task. In this context, we propose Instantaneous Amplitude (IA) and Instantaneous Frequency (IF) features using Energy Separation Algorithm (ESA). The speech signal is passed through bandpass (subband) filters to obtain narrowband components because speech is a combination of several monocomponent signals. To obtain a narrowband filtered signal, we have used linearly-spaced Butterworth and Gabor filterbanks. The instantaneous modulations helps to understand the local characteristics of a non-stationary signal. These IA and IF components are able to capture the information present in a slowly-varying amplitude envelope and fast-varying frequency. The slow-varying temporal modulations for replay speech have the distorted amplitude envelope, and the fast-varying temporal modulation do not preserve the harmonic structure compared to the natural speech signal. For replay speech signal, the intermediate device characteristics and acoustic environment distorts the spectral energy compared to the natural speech energy. Experiments were performed on the ASVspoof 2017 challenge version 2.0 database with Gaussian Mixture Model (GMM) as a classifier. When ESA-IACC and ESA-IFCC feature sets are fused with Constant Q Cepstral Coefficients (CQCC) feature set at the score-level, the % EER further reduces to 11.93% and 10.12%, respectively, on the evaluation set. In addition, for evaluation set, we have also studied the performance of proposed feature sets on different Replay Configurations (RC), namely, acoustic environments, playback, and recording devices. For all the levels of threat conditions (i.e., low, medium, and high-level) to the ASV system, the proposed feature sets performed better compared to the existing state-of-the-art feature sets. In addition to the ASVspoof 2017 Challenge database, we also performed experiments on other spoofing databases, namely, BTAS 2016, ASVspoof 2019 Challenge database, and Real PA of ASVspoof 2019 Challenge database. For all the spoofing databases used in this study, the proposed ESA-based feature sets perform significantly better than the other feature sets.

Full Text