Vector-Based Feature Representations for Speech Signals: From Supervector to Latent Vector

Yuechi Jiang,Frank H F Leung

doi:10.1109/tmm.2020.3014559

Abstract

There are two basic types of feature representations for speech signals. The first type refers to probabilistic models, such as the Gaussian mixture model (GMM). The second type refers to vector-based feature representations, such as the Gaussian supervector (GSV). Since vector-based feature representations are easier to use and process, they are more popular than probabilistic model-based feature representations. In this paper, we begin by explaining the rationale behind two widely used vector-based feature representations, viz. GSV and the i-vector, and then make extensions. GSV is a supervector (SV) based on maximum a posteriori (MAP) adaptation. Its computation is simple and fast, but its dimensionality is high and fixed. While the i-vector is a latent vector (LV) based on factor analysis (FA). Although the computation can be time-consuming because of additional model parameters, its dimensionality is changeable. To generalize GSV, we propose the MAP SV, which is also based on MAP adaptation but can have an even higher dimensionality and thus carry more information. To boost the computational efficiency of the i-vector, we adopt the concept of the mixture of factor analyzers (MFA) and propose the MFA LV, which exhibits a similar flexibility in dimensionality but is faster in computation. The experimental results for speaker identification and verification tasks demonstrate that, MAP SV can be more robust than GSV, and MFALV is comparable to or even better than the i-vector in effectiveness and meanwhile maintains a higher computational efficiency. With a powerful backend, GSV and MAP SV are comparable to the i-vector and MFALV, but the latter two are more flexible in dimensionality.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Vector-Based Feature Representations for Speech Signals: From Supervector to Latent Vector

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia

Lead the way for us

Similar Papers

Gaussian Mixture Model and Gaussian Supervector for Image Classification
Yuechi Jiang ... H F Frank Leung
-
Yuechi Jiang, et. al.Yuechi Jiang ... H F Frank Leung
01 Nov 2018
01 Nov 2018

Learning local factor analysis versus mixture of factor analyzers with automatic model selection
Lei Shi ... Lei Xu
Neurocomputing | VOL. 139
Lei Shi, et. al.Lei Shi ... Lei Xu
31 Mar 2014
Neurocomputing | VOL. 139

Mixtures of factor analyzers with scale mixtures of fundamental skew normal distributions
Sharon X Lee ... Geoffrey J Mclachlan
Advances in Data Analysis and Classification | VOL. 15
Sharon X Lee, et. al.Sharon X Lee ... Geoffrey J Mclachlan
02 Sep 2020
Advances in Data Analysis and Classification | VOL. 15

비대칭 혼합모형과 요인분석자 혼합모형을 이용한 VaR 추정
Kwangyee Ko ... Jangsun Baek
Journal of the Korean Data And Information Science Sociaty | VOL. 29
Kwangyee Ko, et. al.Kwangyee Ko ... Jangsun Baek
31 May 2018
Journal of the Korean Data And Information Science Sociaty | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Vector-Based Feature Representations for Speech Signals: From Supervector to Latent Vector

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia