Abstract

Recently, it has been shown that most automatic speaker verification systems (ASV) are vulnerable to various spoofing attacks. These spoofing attacks can be divided into three major categories: those deriving from text to speech (TTS), voice conversion (VC), and replay spoofing attacks. In the last couple of years, many researchers have proposed many algorithms to detect these spoofing attacks in ASV systems and have made considerable improvements. This paper presents a novel bonafide and spoofed audio detection approach using machine learning and deep learning classifier models with resampling and data augmentation techniques. This paper uses seven machine learning classifier models with different entropy features and three deep learning classifier models with various acoustic features. The performance of these classifier models is analyzed using metrics like precision, recall, accuracy, equal error rate, loss, and confusion metrics. State-of-the-art authors used equal error rate as their performance measure metric; our primary focus is on equal error rate in this paper. And we observed that in our proposed methodology, the deep learning classifier models showed considerable performance improvement. The training phase shows approximate 100% accuracy with a loss of 0.0090 and an equal error rate of 0.06%. In the evaluation phase, it offers approximate 98% accuracy with a loss of 0.0906 and an equal error rate of 1.72%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call