<span lang="EN-US">As voice-based authentication becomes increasingly integrated into security frameworks, establishing effective defenses against voice spoofing, particularly replay attacks, is more crucial than ever. This paper presents a novel comprehensive framework for replay attack detection that leverages the integration of advanced spectral-temporal feature extraction and graph-based feature processing mechanisms. The proposed system presents the design of a waveform encoder and a novel temporal residual unit for spectral and temporal feature extraction in synchronous. Further, an approach of selective attention graph followed by multi-scale feature synthesis is employed to retain precise and spoof indicative feature vectors at the classification layer. The proposed method addresses the significant challenge of distinguishing genuine speech from replayed recordings. The validation of the proposed model is done on the ASVSpoof2019 dataset to demonstrate the efficacy of the proposed approach. The proposed system outperforms existing methods, achieving a lower equal error rate (EER) of 0.015 and a reduced tandem detection cost function (t-DCF) of 0.503. The comparative outcome exhibits the robustness of the method in identifying replay attacks.</span>
Read full abstract