Abstract
Commonly used robust speaker verification systems 1 1 We are only concerned about text-dependent verification with cooperative speakers in a low-noise environment. are based on time-varying autoregressive spectral estimation (AR) combined with hidden Markov modeling (HMM) or dynamic time warping (DTW). An exhaustive optimization of these methods in the past has culminated in quite reliable verification schemes. It seems unlikely, though, that further significant improvements are readily obtained along the same path. While short-time AR-modeling focuses on the time-varying spectral envelope of an utterance, we are introducing a new method that focuses on high-resolution estimates of the time-varying spectral structure of individual pitch periods. The new method employs reduced interference time–frequency distributions (RIDs) in combination with a scale and translation invariant pattern recognition technique (STIR). The new method by itself does not deliver better results than commonly used techniques; however, it is shown that an acceptance/rejection decision derived from both AR-DTW and RID–STIR features greatly improves the performance of the verification system.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have