Abstract

AbstractSpeaker identification is an upcoming boon in this modern technology. It is basically a “one on many mapping” technique in which the speaker can be identified by matching the unknown speaker’s speech with templates of all speakers, called as speaker identification or in other words, an utterance from unknown speaker is analysed and compared with speech model of known speakers. The speaker identification process is segregated into two phases, namely the training phase and the test phase. In training phase, the input of 15 speakers including both male and female speaker’s speech utterances are taken to obtain the individual speaker models by extracting the features such as gammatone cepstral coefficient (GTCC) and mel-frequency cepstral coefficients (MFCC). In test phase, a random utterance spoken by each speaker is subjected for comparison with the speaker models obtained by k-means clustering so as to find out the particular speaker accurately. Finally, a comparison is made between the MFCC and GTCC feature vectors in terms of accuracy in predicting the exact speaker. This speaker identification technology (SIT) is used in various applications such as in voice biometrics for authentication purpose, in surveillance for eavesdropping telephone conversations and in forensics department for backtracking suspect’s voice during crimes. It is also used in Google’s speech recognition system so as to unlock the gadgets with the speaker’s voice that is used as a password for privacy protection. The stated problem objective’s accuracy is expected to be in the range 80–90%. In addition to the two feature extractions, convolutional neural network is deployed for the same set of speaker’s speech utterances, thereby enhancing the accuracy beyond 95%.KeywordsMel-frequency cepstral coefficientsGammatone cepstral coefficientsClusteringEmotion recognition

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call