Approximate Entropy and Empirical Mode Decomposition for Improved Speaker Recognition

Richard A Metzger,Donald L Hall,John F Doherty,David M Jenkins

doi:10.1142/s2424922x20500114

Abstract

When processing real-world recordings of speech, it is highly probable noise will be present at some instance in the signal. Compounding this problem is the situation when the noise occurs in short, impulsive bursts at random intervals. Traditional signal processing methods used to detect speech rely on the spectral energy of the incoming signal to make a determination whether or not a segment of the signal contains speech. However when noise is present, this simple energy detection is prone to falsely flagging noise as speech. This paper will demonstrate an alternative way of processing a noisy speech signal utilizing a combination of information theoretic and signal processing principles to differentiate speech segments from noise. The utilization of this preprocessing technique will allow a speaker recognition system to train statistical speaker model using noise-corrupted speech files, and construct models statistically similar to those constructed from noise-free data. This preprocessing method will be shown to outperform traditional spectrum-based methods for both low-entropy and high-entropy noise in low signal-to-noise ratio environments, with a reduction in the feature space distortion when measured using the Cauchy–Schwarz (CS) distance metric.

Full Text