Abstract

Speaker identification performance is almost perfect in neutral talking environments; however, the performance is deteriorated significantly in shouted talking environments. This work is devoted to proposing, implementing and evaluating new models called Second-Order Circular Suprasegmental Hidden Markov Models (CSPHMM2s) to alleviate the deteriorated performance in the shouted talking environments. These proposed models possess the characteristics of both Circular Suprasegmental Hidden Markov Models (CSPHMMs) and Second-Order Suprasegmental Hidden Markov Models (SPHMM2s). The results of this work show that CSPHMM2s outperform each of: First-Order Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM1s), Second-Order Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM2s) and First-Order Circular Suprasegmental Hidden Markov Models (CSPHMM1s) in the shouted talking environments. In such talking environments and using our collected speech database, average speaker identification performance based on LTRSPHMM1s, LTRSPHMM2s, CSPHMM1s and CSPHMM2s is 74.6%, 78.4%, 78.7% and 83.4%, respectively. Speaker identification performance obtained based on CSPHMM2s is close to that obtained based on subjective assessment by human listeners.

Highlights

  • Speaker recognition is the process of automatically recognizing who is speaking on the basis of individual information embedded in speech signals

  • To evaluate the proposed models, speaker identification performance based on such models is compared separately with that based on each of LTRSPHMM1s, LTRSPHMM2s, and CSPHMM1s in the two talking environments

  • It is evident from this table that each of LTRSPHMM1s, LTRSPHMM2s, CSPHMM1s, and CSPHMM2s perform almost perfect in the neutral talking environments

Read more

Summary

Introduction

Speaker recognition is the process of automatically recognizing who is speaking on the basis of individual information embedded in speech signals. Speaker recognition involves two applications: speaker identification and speaker verification (authentication). Speaker identification is the process of finding the identity of the unknown speaker by comparing his/her voice with voices of registered speakers in the database. Speaker identification can be used in criminal investigations to determine the suspected persons who generated the voice recorded at the scene of the crime. Speaker identification can be used in civil cases or for the media. These cases include calls to radio stations, local or other government authorities, insurance companies, monitoring people by their voices, and many other applications

Objectives
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.