Abstract

The performance of many speech processing algorithms depends on modeling speech signals using appropriate probability distributions. Various distributions such as the Gamma distribution, Gaussian distribution, Generalized Gaussian distribution, Laplace distribution as well as multivariate Gaussian and Laplace distributions have been proposed in the literature to model different segment lengths of speech, typically below 200 ms in different domains. In this paper, we attempted to fit Laplace and Gaussian distributions to obtain a statistical model of speech short-time Fourier transform coefficients with high spectral resolution (segment length >500 ms) and low spectral resolution (segment length <10 ms). Distribution fitting of Laplace and Gaussian distributions was performed using maximum-likelihood estimation. It was found that speech short-time Fourier transform coefficients with high spectral resolution can be modeled using Laplace distribution. For low spectral resolution, neither the Laplace nor Gaussian distribution provided a good fit. Spectral domain modeling of speech with different depths of spectral resolution is useful in understanding the perceptual stability of hearing which is necessary for the design of digital hearing aids.

Highlights

  • Several speech processing methods such as speech coding, speaker/speech recognition, speech synthesis, speech enhancement, voice activity detection (VAD), independent component analysis (ICA), speaker diarization, etc. require the statistical modeling of speech signals

  • We show that the distribution of short-time Fourier transform (STFT) coefficients of speech segments over durations greater than 500 ms can be modeled accurately by Laplacian distribution (LD), having a small RMS error for the estimated LD parameters, which validates the estimated parameters

  • We demonstrate through computer simulation that STFT coefficients of speech with high spectral resolution fit reasonably accurately to LD as shown in Figures 2–5 for arbitrary speech segments of different individuals

Read more

Summary

Introduction

Several speech processing methods such as speech coding, speaker/speech recognition, speech synthesis, speech enhancement, voice activity detection (VAD), independent component analysis (ICA), speaker diarization, etc. require the statistical modeling of speech signals. We show that the distribution of short-time Fourier transform (STFT) coefficients (spectral domain) of speech segments over durations greater than 500 ms can be modeled accurately by LD, having a small RMS error for the estimated LD parameters, which validates the estimated parameters. In ML estimation, based on the observation of STFT coefficients of speech, which are assumed to be independently and identically distributed (IID), the distribution parameters which maximize their likelihood function are estimated. It is a standard assumption in the literature that DFT coefficients of speech are IID [10,11,12]. Fisher Activity information associated with the estimated parameters [17]

Direct
Experimental Procedure and Discussion of Results
11. LDfit fit forSTFT
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call