Abstract
Various biometric security systems, such as face recognition, fingerprint, voice, hand geometry, and iris, have been developed. Apart from being a communication medium, the human voice is also a form of biometrics that can be used for identification. Voice has unique characteristics that can be used as a differentiator between one person and another. A sound speaker recognition system must be able to pick up the features that characterize a person's voice. This study aims to develop a human speaker recognition system using the Convolutional Neural Network (CNN) method. This research proposes improvements in the fine-tuning layer in CNN architecture to improve the Accuracy. The recognition system combines the CNN method with Mel Frequency Cepstral Coefficients (MFCC) to perform feature extraction on raw audio and K Nearest Neighbor (KNN) to classify the embedding output. In general, this system extracts voice data features using MFCC. The process is continued with feature extraction using CNN with triplet loss to obtain the 128-dimensional embedding output. The classification of the CNN embedding output uses the KNN method. This research was conducted on 50 speakers from the TIMIT dataset, which contained eight utterances for each speaker and 60 speakers from live recording using a smartphone. The accuracy of this speaker recognition system achieves high-performance accuracy. Further research can be developed by combining different biometrics objects, commonly known as multimodal, to improve recognition accuracy further.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal on Advanced Science, Engineering and Information Technology
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.