Skew Gaussian mixture models for speaker recognition

Avi Matza,Yuval Bistritz

doi:10.1049/iet-spr.2013.0270

Avi Matza, Yuval Bistritz

Open Access

PDF Available

https://doi.org/10.1049/iet-spr.2013.0270

Copy DOI

Export

Save

Cite

Journal: IET Signal Processing	Publication Date: Oct 1, 2014
Citations: 12

Affiliation: Tel Aviv University

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Gaussian mixture models (GMMs) are widely used in speech and speaker recognition. This study explores the idea that a mixture of skew Gaussians might capture better feature vectors that tend to have skew empirical distributions. It begins with deriving an expectation maximisation (EM) algorithm to train a mixture of two-piece skew Gaussians that turns out to be not much more complicated than the usual EM algorithm used to train symmetric GMMs. Next, the algorithm is used to compare skew and symmetric GMMs in some simple speaker recognition experiments that use Mel frequency cepstral coefficients (MFCC) and line spectral frequencies (LSF) as the feature vectors. MFCC are one of the most popular feature vectors in speech and speaker recognition applications. LSF were chosen because they exhibit significantly more skewed distribution than MFCC and because they are widely used [together with the related immittance spectral frequencies (ISF)] in speech transmission standards. In the reported experiments, models with skew Gaussians performed better than models with symmetric Gaussians and skew GMMs with LSF compared favourably with both skew symmetric and symmetric GMMs that used MFCC.

Full Text