Abstract

The human voice, a dynamic signal, conveys valuable information for speaker identification, encompassing gender, age, emotions, and language. In the biometrics industry, identifying voices in real-time amidst diverse accents, tones, and noisy backgrounds is a challenging task. Voice biometry, a complex aspect of speaker identification, is gaining importance in various applications, such as user authentication, attendance systems, forensics, and banking operations, as it eliminates the need for traditional credentials like cards or passwords. Recent advancements in Human–Computer Interaction technology have made conversational tasks technically feasible. Deep Neural Learning approaches, especially Convolutional Deep Neural Networks (CDNN), have emerged as a powerful tool in the field of speech processing, surpassing traditional Speaker Identification methods. This paper introduces a novel approach using 1-Dimensional Convolutional Residual Blocks for audio classification and Speaker Identification, specifically focusing on speaker recognition from spoken Hindi language. The proposed Residual architecture significantly enhances speaker identification, even in low Signal Noise Ratio environments, achieving an impressive accuracy rate of 86.02%. This outperforms traditional Gaussian Mixture Model (GMM) and Feed Forward Back-propagation Network (FFBN) model for the same set of speakers. Future research directions may explore the classification of audio and speaker identification using various acoustic features derived from speech signals.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call