Abstract

Recently, the main performance improvement in automatic speaker recognition technology has been due to session variability compensation techniques, mainly based on factor analysis (FA), which have reduced the equal error rate (EER) of state-of-the-art systems by a factor of 10 in less than five years (e.g., EER<2% for NIST SRE 2008 telephone speech). Moreover, such systems are able to compute millions of comparisons a thousand times faster than real time after speech features are extracted. However, some challenges remain, because if there is a mismatch between the conditions of the FA training database and the speech used for comparison, the effectiveness of the compensation significantly decreases. This problem is especially relevant in forensic voice comparison, where the availability of speech matching operational conditions is usually sparse. In this presentation we show the impact of this effect in realistic simulated case studies. We use the Baeza–Ahumada IV database, which contains speech acquired with the Spanish Guardia Civil facilities, used in their daily work. We also present algorithms to handle sparsity in the data used for training FA models. Finally, we outline future research plans in order to improve session variability compensation performance in forensically realistic conditions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call