Application of harmonic plus noise model for enhancing speaker recognition

Gidda Reddy Gangula,Prem C Pandey,Parveen K Lehana

doi:10.1121/1.4787208

Abstract

Speaker recognition systems mostly employ mel frequency cepstral coefficients (MFCC). Performance of these systems is generally affected by background noise, transmission medium, etc. Further, they do not perform well in text‐independent environment with limited training data. For enhanced performance, the set of parameters used should separate the speaker‐dependent information from the linguistic information. Towards this end, application of parameters of the harmonic plus noise model (HNM)‐based analysis is investigated. HNM divides the speech spectrum into harmonic and noise bands separated by a dynamically varying maximum voiced frequency. This frequency is a speaker‐dependent parameter and its estimation is not affected by moderate SNR degradation. A speaker recognition system was devised using three HNM parameters, namely, maximum voiced frequency, relative noise band energy, and pitch. It gave‐performed comparably to that of MFCC‐based recognition, for a group of ten speakers. An enhanced performan...

Full Text