This study examines how self-supervised learning and a novel Indonesian language dataset enhance anti-spoofing systems. Results show improved model performance, with a lower Equal Error Rate (EER) during training, indicating effective learning from diverse audio samples. Using weighted cross-entropy analysis highlights the model's robustness in minimizing training errors. Comparisons with baseline models using English data reveal the proposed approach's superiority, achieving a significantly lower EER due to the incorporation of language-specific data. The unique phonetic features of Indonesian languages provide valuable training material, boosting the system's defence against spoofing attacks. The dataset improves generalization across dialects and recording conditions by including diverse speech samples. This integration enhances the anti-spoofing systems' adaptability, which is vital for real-world applications where recording variability affects performance. The experimental setup used a balanced dataset of genuine and spoofed utterances from male and female speakers, ensuring high-quality input. The training configuration splits the dataset into training, development, and testing sets on a high-performance computing setup. Results showed the proposed model achieved an EER of 0.33, compared to 7.65 for the traditional sinc-layer model and 0.82 for the wav2vec 2.0 model with English data. Overall, this research advances anti-spoofing solutions and emphasizes the need for diverse datasets and advanced learning approaches to improve automatic speaker verification systems in practical applications. The incorporation of the Indonesian dataset is vital for addressing linguistic diversity challenges in biometric security, paving the way for future advancements in this area.
Read full abstract