Analysis of cosine distance features for speaker verification

Kuruvachan K George,C Santhosh Kumar,Sunil Sivadas,K.I Ramachandran,Ashish Panda

doi:10.1016/j.patrec.2018.08.019

Abstract

In this paper, we describe a method for representing the acoustic similarity of a target speaker with respect to a set of known speakers as a feature for speaker verification. We propose a novel distance based representation by encoding the cosine distance between i-vectors of the utterances belonging to target speaker and reference speakers. The new feature is referred to as cosine distance feature (CDF) and is used with a support vector machine (SVM) classifier (CDF-SVM). We show that reference speakers who rank high in acoustic similarity to the target speaker are more important for better speaker discrimination. A sparse representation of the CDF, that retains only a few of the largest values which correspond to the most similar reference speakers in the CDF vector is found to perform better than the baseline CDF system. We also explore speaker specific CDF where each target speaker has specific subset of most acoustically similar reference speakers. We show that the acoustic similarities between the target and reference speakers are best captured using an intersection kernel SVM. Experimental results on the core short2-short3 condition of NIST 2008 SRE, for both female and male trials, show that the speaker specific CDF outperforms the i-vector and speaker independent CDF based state-of-the-art speaker verification systems.

Full Text