Abstract

Conventional speaker identification systems require features that are carefully designed to achieve high identification accuracy rates. With deep learning, these features are learned rather than specifically designed. The improvements of deep neural networks algorithms and techniques lead to an increase in using deep neural networks for speaker identification systems in favour of the conventional systems. In this paper, we use a convolutional neural network with Mel-spectrogram as an input for the identification purpose. The experiments are done on TIMIT dataset to evaluate the proposed CNN architecture and to compare with state-of-the-art systems for clean and noisy speech samples.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.