High-performance session variability compensation in forensic automatic speaker recognition.

Daniel Ramos,Joaquin Gonzalez‐Rodriguez,Javier Gonzalez‐Dominguez

doi:10.1121/1.3508453

Daniel Ramos, Joaquin Gonzalez‐Rodriguez + Show 1 more

https://doi.org/10.1121/1.3508453

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Recently, the main performance improvement in automatic speaker recognition technology has been due to session variability compensation techniques, mainly based on factor analysis (FA), which have reduced the equal error rate (EER) of state-of-the-art systems by a factor of 10 in less than five years (e.g., EER&lt;2% for NIST SRE 2008 telephone speech). Moreover, such systems are able to compute millions of comparisons a thousand times faster than real time after speech features are extracted. However, some challenges remain, because if there is a mismatch between the conditions of the FA training database and the speech used for comparison, the effectiveness of the compensation significantly decreases. This problem is especially relevant in forensic voice comparison, where the availability of speech matching operational conditions is usually sparse. In this presentation we show the impact of this effect in realistic simulated case studies. We use the Baeza–Ahumada IV database, which contains speech acquired with the Spanish Guardia Civil facilities, used in their daily work. We also present algorithms to handle sparsity in the data used for training FA models. Finally, we outline future research plans in order to improve session variability compensation performance in forensically realistic conditions.

Full Text