Robustness of DAS Beamformer Over MVDR for Replay Attack Detection On Voice Assistants

Piyushkumar K Chodingala,Hemant A Patil,Ankur T Patil,Shreya S Chaturvedi

doi:10.1109/spcom55316.2022.9840757

Abstract

Due to the increased use of Virtual Assistants (VAs) for various personal usage, the safety of VAs from various spoofing attacks is utmost important. To that effect, we investigate the significance of Delay-and-Sum (DAS) beamformer over state-of-the-art Minimum Variance Distortionless Response (MVDR) along with Teager Energy Operator (TEO)-based features for replay Spoof Speech Detection (SSD) on VAs. Conventional DAS method is known to suppress the additive noise component and retains the reverberation effect (i.e., an important acoustic cue for replay SSD). On the contrary, MVDR used for Distant Speech Recognition (DSR) suppresses the reverberation effect and additive noise. Hence, MVDR is not suitable choice for replay SSD, whereas DAS can be exploited for replay SSD in VAs. Furthermore, suppression of reverberation due to the DAS vs. MVDR beamformer is analyzed via TEO profile. The experimental validation is done on recently released Realistic Replay Attack Microphone-Array Speech Corpus (ReMASC) and its DAS vs. MVDR beamformed versions. Furthermore, Teager Energy Cepstral Coefficients (TECC) feature set is employed as it is recently shown to capture the characteristics of reverberation for replay SSD task. For performance comparison, Constant-Q Cepstral Coefficients (CQCC), Linear Frequency Cepstral Coefficients (LFCC), and Mel Frequency Cepstral Coefficients (MFCC) feature sets along with Gaussian Mixture Model (GMM) classifier are used. In particular, TECC-GMM SSD system on DAS gave relative reduction in %EER by 13.12% and 43.16% for Eval set as compared to the original ReMASC and its MVDR beamformed version, respectively. Finally, relative significance of TECC w.r.t. practical deployment is shown through latency analysis of various SSD systems for VAs.

Full Text