Abstract

State-of-the-art speaker recognition technology attains great performance in controlled conditions. However, when the speech segments suffer distortions like noise or reverberation performance can severely deteriorate, this fact motivated us to investigate how score distributions diverge from the ideal ones in degraded conditions. We propose a Bayesian network model that assumes that two scores exist: one observed and another one hidden. The observed score or noisy score is the one given by the speaker verification system. Meanwhile, the hidden score or clean score is the ideal score that we would obtain in a trial with high-quality speech. A set of quality measures helps to relate both scores. We applied this network to two tasks. The first one consists in rejecting unreliable trials, i.e., trials that we cannot assure whether they are target or nontarget. We prove that this method outperforms previous approaches, based on another type of Bayesian networks. The second task is to compute an improved likelihood ratio, dependent on the quality measures. This ratio improved calibration in noisy conditions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call