Abstract

AbstractIn the i‐vector/probabilistic linear discriminant analysis (PLDA) technique, the PLDA backend classifier is modelled on i‐vectors. PLDA defines an i‐vector subspace that compensates the unwanted variability and helps to discriminate among speaker‐phrase pairs. The channel or session variability manifested in i‐vectors are known to be nonlinear in nature. PLDA training, however, assumes the variability to be linearly separable, thereby causing loss of important discriminating information. Besides, the i‐vector estimation, itself, is known to be poor in case of short utterances. This paper attempts to address these issues using a simple hierarchy‐based system. A modified fuzzy‐clustering technique is employed to divide the feature space into more characteristic feature subspaces using vocal source features. Thereafter, a separate i‐vector/PLDA model is trained for each of the subspaces. The sparser alignment owing to subspace‐specific universal background model and the relatively reduced dimensions of variability in individual subspaces help to train more effective i‐vector/PLDA models. Also, vocal source features are complementary to mel frequency cepstral coefficients, which are transformed into i‐vectors using mixture model technique. As a consequence, vocal source features and i‐vectors tend to have complementary information. Thus using vocal source features for classification in a hierarchy tree may help to differentiate some of the speaker‐phrase classes, which otherwise are not easily discriminable based on i‐vectors. The proposed technique has been validated on Part 1 of RSR2015 database, and it shows a relative equal error rate reduction of up to 37.41% with respect to the baseline i‐vector/PLDA system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call