Abstract

Speaker Recognition Defined by the process of recognizing a person by his\her voice through specific features that extract from his\her voice signal. An Automatic Speaker recognition (ASP) is a biometric authentication system. In the last decade, many advances in the speaker recognition field have been attained, along with many techniques in feature extraction and modeling phases. In this paper, we present an overview of the most recent works in ASP technology. The study makes an effort to discuss several modeling ASP techniques like Gaussian Mixture Model GMM, Vector Quantization (VQ), and Clustering Algorithms. Also, several feature extraction techniques like Linear Predictive Coding (LPC) and Mel frequency cepstral coefficients (MFCC) are examined. Finally, as a result of this study, we found MFCC and GMM methods could be considered as the most successful techniques in the field of speaker recognition so far.

Highlights

  • Speaker Recognition (SR) is an automated technique of identifying an individual on the basis on his\her voice signal, which is a biometric method like other biometrics such as fingerprint, Palm, Retina, Iris, and Face recognition

  • The results show that Mel frequency cepstral coefficients (MFCC) is more quality than Inner Hair Cell Coefficients (IHC), especially when combined with pitch and formants

  • Speaker recognition systems proved its ability in the area of biometrics, there are many factors like the dynamic behavior of the speech signal with the requirement of working in real-time still increase the complexity of the process

Read more

Summary

INTRODUCTION

Speaker Recognition (SR) is an automated technique of identifying an individual on the basis on his\her voice signal, which is a biometric method like other biometrics such as fingerprint, Palm, Retina, Iris, and Face recognition. The main difference between Speaker Recognition and other biometrics is that Speaker Recognition can be considered as the only technology that processes acoustic information, in contrast with other methods, which usually use image information. Another significant difference is the capability to service with telephone equipment, and that would make it more broadly applicable to diversity settings. ASR can be implemented in two methods, Text-dependent speaker recognition (TDSR) and Textindependent speaker recognition (TISR). TDSR primarily applied to the speaker verification type. Whereas TISR primarily applied to the speaker identification type [2].

SPEAKER RECOGNITION DEVELOPMENT
FEATURE EXTRACTION TECHNIQUES
Formants and Entropies
Hybrid PPCA-FA
Clustering Algorithms
Findings
DISCUSSION
CONCLUSIONS
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.