Abstract

Speaker recognition is the process of recognizing the speaker by using speaker-specific information. A speaker recognition system can be classified into text-dependent speaker recognition and text-independent speaker recognition systems. In a text-dependent system, the recognition phrases are fixed (known beforehand). The user can be prompted to read a randomly selected sequence of numbers. However, in a text-independent speaker recognition system, there are no constraints on the words which the speakers are allowed to use. What is spoken in training and what is uttered in actual use may have completely different content. The entire domain of speaker recognition can be further categorized into speaker identification and speaker verification. Speaker verification evaluates whether the voice belongs to some person, while speaker identification tries to find out the person it belongs to. In this paper, Mel-frequency cepstral coefficients (MFCC) were extracted from the audio files. These features were then fed a convolutional neural network (CNN). This CNN was then optimized in order to increase model accuracy. Over the span of six runs of varying parameters, a maximum accuracy of approx. 97% was achieved.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call