Performance of the OZU speaker verification systems with the NIST SRE 2010 data in a multi-class scenario

Fatih Yesil,Cenk Demiroglu

doi:10.1109/siu.2012.6204729

Abstract

Performance of the speaker verification systems is typically measured based on their binary decision accuracy. However, in speaker verification applications where close to %100 accuracy is required, such as the systems that are used in the call centers of finance companies, it is not possible to rely on the binary decisions of the existing verification systems. Still, in such cases, multi-class verification outputs (for example, high, medium and low verification score) returned by the speaker verification systems can be used by a human agent to either reduce the verification time and/or increase the verification accuracy compared to a human-only scenario. In this work, we compare such multiclass output performance of some of the most popular speaker verification systems when a human agent is assumed to be in the verification loop. Performance is measured by the reduction in the number of questions used by the human agent for verifying the identity of the caller without compromising from the security. Experiments are performed using the NIST 2010 database for the 8 conversation sides (5 minutes each) enrollment data and 10 seconds verification data condition.

Full Text