Nowadays, deep neural networks are in a phase of rapid development. Simultaneously, the field of biometric forgery is also advancing. Systems that can successfully pass face verification systems are emerging and continuously improving deepfake videos and voice messages are created. These developments can have a negative impact on a person’s reputation or cause serious security breaches. This paper proposes an approach for spoofing detection in voice biometrics using the ASVspoof2019 LA dataset The model is trained and validated on subsets representing one type of attack, and evaluated on a subset containing more advanced types of spoofing attacks, demonstrating the model’s ability to generalize to more complex attack scenarios. Two models, capsule-based and TCN-based, are proposed, noted as ResCapsGuard and Res2TCNGuard, respectively. ResCapsGuard achieved an Equal Error Rate (EER) value of 2.27, while Res2TCNGuard reached an EER value of 1.49. Notebooks with our models are available in repositories in github. Due to the fact that a random part is cut out of the audio, the results may vary.
Read full abstract