Mel-frequency Domain Research Articles

In this paper, the speech signal recorded from the desired speaker close to microphone in natural environment is regarded as foreground speech and rest of the interfering sources as background noise . The proposed paper exploits speech production features like glottal closure instants in time domain and vocal tract information in spectral domain to segment the desired speaker's speech and to further enhance it. The foreground speech is perceptually enhanced using the auditory perception feature in mel-frequency domain using mel-cepstral coefficients and its inversion using mel log spectrum approximation filter. The focus is on enhancing the production and perceptual features of foreground speech rather than relying on modeling the interfering sources. The speech data are collected in different natural environments from different speakers in order to evaluate the proposed method. The enhanced speech signals derived at three different stages of the proposed method are evaluated with state-of-the-art methods in terms of subjective and objective measures. The proposed method provides improved performance compared to the considered state-of-the-art methods. In terms of the proposed objective measure foreground to background Ratio , the enhancement approach presented in this paper gives an average improvement of 12 dB as opposed to existing spectral subtraction-based method which provides 3 dB. Moreover, subjective evaluation using 24 different subjects corroborates the objective test results.

Read full abstract

In this paper, we propose a method to analytically obtain a linear-transformation on the conventional Mel frequency cepstral coefficients (MFCC) features that corresponds to conventional vocal tract length normalization (VTLN)-warped MFCC features, thereby simplifying the VTLN processing. There have been many attempts to obtain such a linear-transformation, but all the previously proposed approaches either modify the signal processing (and therefore not conventional MFCC), or the linear-transformation does not correspond to conventional VTLN-warping, or the matrices being estimated and are data dependent. In short, the conventional VTLN part of an automatic speech recognition (ASR) system cannot be simply replaced with any of the previously proposed methods. Umesh proposed the idea to use band-limited interpolation for performing VTLN-warping on MFCC using plain cepstra. Motivated from this work, Panchapagesan and Alwan proposed a linear-transformation to perform VTLN-warping on conventional MFCC. However, in their approach, VTLN warping is specified in the Mel-frequency domain and is not equivalent to conventional VTLN. In this paper, we present an approach which also draws inspiration from the work of Umesh , and which we believe for the first time performs conventional VTLN as a linear-transformation on conventional MFCC using the ideas of band-limited interpolation. Deriving such a linear-transformation to perform VTLN, would allow us to use the VTLN-matrices in transform-based adaptation framework with its associated advantages and yet would require the estimation of a single parameter. Using four different tasks, we show that our proposed approach has almost identical recognition performance to conventional VTLN on both clean and noisy speech data.

Read full abstract

Mel-frequency Domain Research Articles

Articles published on Mel-frequency Domain

An Improved Multi-band Spectral Subtraction using Mel-scale

Foreground Speech Segmentation and Enhancement Using Glottal Closure Instants and Mel Cepstral Coefficients

Speech enhancement using hidden Markov models in Mel-frequency domain

VTLN Using Analytically Determined Linear-Transformation on Conventional MFCC

Hidden-Markov-model-based voice activity detector with high speech detection rate for speech enhancement

Channel Robust Feature Transformation Based on Filter-Bank Energy Filtering

On the Use of Evolutionary Algorithms to Improve the Robustness of Continuous Speech Recognition Systems in Adverse Conditions

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Mel-frequency Domain Research Articles

Articles published on Mel-frequency Domain

An Improved Multi-band Spectral Subtraction using Mel-scale

Foreground Speech Segmentation and Enhancement Using Glottal Closure Instants and Mel Cepstral Coefficients

Speech enhancement using hidden Markov models in Mel-frequency domain

VTLN Using Analytically Determined Linear-Transformation on Conventional MFCC

Hidden-Markov-model-based voice activity detector with high speech detection rate for speech enhancement

Channel Robust Feature Transformation Based on Filter-Bank Energy Filtering

On the Use of Evolutionary Algorithms to Improve the Robustness of Continuous Speech Recognition Systems in Adverse Conditions