Abstract
AbstractIn the i‐vector/probabilistic linear discriminant analysis (PLDA) technique, the PLDA backend classifier is modelled on i‐vectors. PLDA defines an i‐vector subspace that compensates the unwanted variability and helps to discriminate among speaker‐phrase pairs. The channel or session variability manifested in i‐vectors are known to be nonlinear in nature. PLDA training, however, assumes the variability to be linearly separable, thereby causing loss of important discriminating information. Besides, the i‐vector estimation, itself, is known to be poor in case of short utterances. This paper attempts to address these issues using a simple hierarchy‐based system. A modified fuzzy‐clustering technique is employed to divide the feature space into more characteristic feature subspaces using vocal source features. Thereafter, a separate i‐vector/PLDA model is trained for each of the subspaces. The sparser alignment owing to subspace‐specific universal background model and the relatively reduced dimensions of variability in individual subspaces help to train more effective i‐vector/PLDA models. Also, vocal source features are complementary to mel frequency cepstral coefficients, which are transformed into i‐vectors using mixture model technique. As a consequence, vocal source features and i‐vectors tend to have complementary information. Thus using vocal source features for classification in a hierarchy tree may help to differentiate some of the speaker‐phrase classes, which otherwise are not easily discriminable based on i‐vectors. The proposed technique has been validated on Part 1 of RSR2015 database, and it shows a relative equal error rate reduction of up to 37.41% with respect to the baseline i‐vector/PLDA system.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.