Abstract

The process of recognizing a human based on one’s voice is called speaker identification. Speech signals being susceptible to significant variations, it is a quite challenging task and conventional speaker identification (SID) systems perform poorly under different noisy environments. This study presents a robust speaker identification system based on auditory-inspired features called cochleagram. Cochleagram is generated using a gammatone filterbank having 128 channels from frequency 50 to 8000 Hz. A convolutional neural network (CNN) is trained with a combination of cochleagrams constructed from clean and a fixed noise added over speech samples at a certain signal-to-noise ratio, referred as noise adapted CNN. The proposed model was then tested for different noises at different levels of SNRs. Experimental results showed that the proposed system showed better performance than the existing neurogram based method under noisy conditions particularly at very low SNRs for text-dependent as well as text-independent corpora. The proposed system could be used in an automatic speech recognition (ASR) system as a preprocessor.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.