Abstract
The conventional i-vector approach to speaker and language recognition constitutes an unsupervised learning paradigm where a variable length speech utterance is converted into a fixed dimensional feature vector (termed as i-vector). The i-vector approach belongs to the broader family of factor analysis models where the utterance level adapted means of a Gaussian Mixture Model - Universal Background Model (GMM-UBM) are assumed to lie in a low rank subspace. The latent variables in the low rank model are assumed to have a standard Gaussian prior distribution. In this paper, we rework the theory of i-vector modeling in a supervised framework where the class labels (like language or accent) of the speech recordings are introduced directly into the i-vector model using a mixture Gaussian prior where each mixture component is associated with a class label. We provide the mathematical formulation for minimum mean squared error estimate (MMSE) of the supervised i-vector (s-vector) model. A detailed analysis of the s-vector model is given and this is contrasted with the traditional i-vector framework. The proposed model is used for language recognition tasks using the NIST Language Recognition Evaluation (LRE) 2017 dataset as well as an accent recognition task using the Mozilla common voices dataset. In these experiments, the s-vector model provides significant improvements over the conventional i-vector model (relative improvements of up to 24% for LRE task in terms of primary detection cost metric).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.