Abstract

Using a proper distribution function for speech signal or for its representations is of crucial importance in statistical-based speech processing algorithms. Although the most commonly used probability density function (pdf) for speech signals is Gaussian, recent studies have shown the superiority of super-Gaussian pdfs. A large research effort has focused on the investigation of a univariate case of speech signal distribution; however, in this paper, we study the multivariate distributions of speech signal and its representations using the conventional distribution functions, e.g., multivariate Gaussian and multivariate Laplace, and the copula-based multivariate distributions as candidates. The copula-based technique is a powerful method in modeling non-Gaussian multivariate distributions with non-linear inter-dimensional dependency. The level of similarity between the candidate pdfs and the real speech pdf in different domains is evaluated using the energy goodness-of-fit test. In our evaluations, the best-fitted distributions for speech signal vectors with different lengths in various domains are determined. A similar experiment is performed for different classes of English phonemes (fricatives, nasals, stops, vowels, and semivowel/glides). The evaluation results demonstrate that the multivariate distribution of speech signals in different domains is mostly super-Gaussian, except for Mel-frequency cepstral coefficient. Also, the results confirm that the distribution of the different phoneme classes is better statistically modeled by a mixture of Gaussian and Laplace pdfs. The copula-based distributions provide better statistical modeling of vectors representing discrete Fourier transform (DFT) amplitude of speech vectors with a length shorter than 500 ms.

Highlights

  • Statistical-based speech processing algorithms have attracted wide interests during the last three decades in numerous applications, e.g., speech coding [1], speech recognition [2, 3], speech synthesis [4], and speech enhancement [5]

  • – The best-fitted candidate in the sense of the energy test for the T, real parts of DFT (RDFT), imaginary parts of DFT (IDFT), and Discrete cosine transform (DCT) features with frame length of 20, 30, and 100 ms is MLD, despite the often used assumption of multivariate Gaussian distribution in the speech enhancement algorithms [8,9,10], but consistent with the univariate Laplace distribution proposed by Martin [6] and Gazor et al [14]. – The univariate Rayleigh distribution has been proposed for amplitude of DFT (ADFT) feature with a short frame length

  • Varying the best-fitted distribution for Linear predictive coefficient (LPC) features from MGLD to MGD verifies this contribution, too. – The best-fitted candidate for the Mel-frequency cepstral coefficient (MFCC) with different frame lengths is MGD, consistent with the assumption of multivariate Gaussian distribution used in most speech recognition algorithms [2, 3]

Read more

Summary

Introduction

Statistical-based speech processing algorithms have attracted wide interests during the last three decades in numerous applications, e.g., speech coding [1], speech recognition [2, 3], speech synthesis [4], and speech enhancement [5]. There are typically several challenges in the studying and modeling of speech signals in the multivariate distribution case, e.g., the non-linear or linear inter-dimensional dependency, and the sparsity and complexity of the multidimensional space. Traditional hidden Markov model (HMM)-based speech recognition and synthesis algorithms [3, 27] exploit Mel-frequency cepstral coefficients (MFCC); HMM-based speaker recognition [13] systems exploit either linear predictive coding (LPC) or MFCC; HMM-based speech enhancement algorithms use LPC, time, DCT, MFCC, or DFT [7, 9, 10]; and codebook-driven-based speech enhancement algorithms [28] employ LPC All these algorithms assume the multivariate Gaussian pdf for extracted features of speech signals. The purpose of this section is to briefly review the basic definition of the copula and a number of the most commonly used estimation methods for fitting the copula to the real data

Copula model
Gaussian copulas
Student-t copulas
Fit a copula model
N ln c h F
Energy test
A bj ð15Þ
B: Second best-candidate
Conclusions
Additional file
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call