Abstract

In order to process speech, most state-of-the-art experimental methods employ convolutional neural networks (CNNs), which operate on a continuous, 1-dimensional (1-D) time stream. In an audio signal, the mel-spectrogram facilitates the representation of attributes of the utterances' in the frequency domain (which corresponds to the speech spectrum). Moreover, for a time-series speaker signal, CNNs are superior to machine or transfer learning models in capturing characteristics from long-form talks. This paper introduces a jump-connected 1-D CNN that employs a combined loss function for speaker recognition. The suggested model uses a 1-D convolutional layer combined with jump connections to extract speaker-specific characteristics; this reduces time-based and frequency-based variability for faster computing. A combined softmax loss, stable L2-norm, and smooth L1-norm loss function guide the proposed compact convolutional neural networks (CCNN) to identify the correct spokesman with improved efficacy. We evaluated the proposed framework using various standard and real-time audio datasets. The experimental findings demonstrate that the proposed CCNN outperforms existing approaches by reducing the equal error rate by 9.02 %. Also, our recommended voiceprint identification model achieves an impressive average speaker recognition rate of 98.76 %. Simultaneously, the reliability of the 1-D CCNN is tested under various conditions. Other fields of study, like language modelling, could employ this approach after some fine-tuning.Relevance of the work: Speaker recognition is an area of interest in which machine learning (ML) and deep learning (DL) schemes, when combined, have the potential to make history in the areas of forensic sciences, automation, and authentication. Using a modest CNN can enhance the identification and verification process by ignoring many issues such as false positives, background noise, and so on. Expanding this process would facilitate raga identification and disease treatment therapies.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.