Abstract
In this paper, we describe a method for representing the acoustic similarity of a target speaker with respect to a set of known speakers as a feature for speaker verification. We propose a novel distance based representation by encoding the cosine distance between i-vectors of the utterances belonging to target speaker and reference speakers. The new feature is referred to as cosine distance feature (CDF) and is used with a support vector machine (SVM) classifier (CDF-SVM). We show that reference speakers who rank high in acoustic similarity to the target speaker are more important for better speaker discrimination. A sparse representation of the CDF, that retains only a few of the largest values which correspond to the most similar reference speakers in the CDF vector is found to perform better than the baseline CDF system. We also explore speaker specific CDF where each target speaker has specific subset of most acoustically similar reference speakers. We show that the acoustic similarities between the target and reference speakers are best captured using an intersection kernel SVM. Experimental results on the core short2-short3 condition of NIST 2008 SRE, for both female and male trials, show that the speaker specific CDF outperforms the i-vector and speaker independent CDF based state-of-the-art speaker verification systems.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.