Novel energy separation based instantaneous frequency features for spoof speech detection

Madhu R Kamble,Hemant A Patil

doi:10.23919/eusipco.2017.8081178

Abstract

Speech Synthesis (SS) and Voice Conversion (VC) presents a genuine risk of attacks for Automatic Speaker Verification (ASV) technology. In this paper, we evaluate front-end anti-spoofing technique to protect ASV system for SS and VC attack using a standard benchmarking database. In particular, we propose a novel feature set, namely, Energy Separation Algorithm-based Instantaneous Frequency Cosine Coefficients (ESA-IFCC) to detect the genuine and impostor speech. The experiments are carried out on ASV Spoof 2015 Challenge database. On the development set, the score-level fusion of proposed ESA-IFCC feature set with Mel Frequency Cepstral Coefficients (MFCC) gave an EER of 3.45 %, which reduced significantly from MFCC (6.98 %) and ESA-IFCC (5.43 %) with 13-D static features. The EER decreases further to 2.01 % and 1.89 % for Δ and ΔΔ features derived from proposed ESA-IFCC features, respectively. The overall average error rate for known and unknown attacks in evaluation set was 6.79 % for ESA-IFCC and was significantly better than the MFCC (9.15 %) features.

Full Text