Speech Signal In Time Domain Research Articles

The aim of this work is to enable a noise-free time-domain speech signal to be reconstructed from a stream of MFCC vectors and fundamental frequency and voicing estimates, such as may be received in a distributed speech recognition system. To facilitate reconstruction, both a sinusoidal model and a source-filter model of speech are compared by listening tests and spectrogram analysis, with the result that the former provides higher quality speech reconstruction. Analysis of the sinusoidal model shows that for clean speech reconstruction, both a noise-free spectral envelope and a robust estimate of the fundamental frequency and voicing are necessary. Investigation into fundamental frequency estimation reveals that an auditory model based approach gives superior performance over other methods of estimation. This leads to the proposal of an integrated front-end which uses the auditory model for both fundamental frequency and voicing estimation, and as the filterbank stage in MFCC extraction, and thereby reduces computation. Applying spectral subtraction to the auditory model parameters improves the spectral envelope estimates needed for clean speech reconstruction. Experiments on the Aurora connected digits database show that the auditory model-based MFCCs give comparable performance to that attained with conventional MFCCs. Speech reconstruction tests reveal that the combination of robust fundamental frequency and voicing estimation with spectral subtraction in the integrated front-end leads to intelligible and relatively noise-free speech.

Read full abstract

A growing body of recent work documents the potential benefits of sub-band processing over wideband processing in automatic speech recognition and, less usually, speaker recognition. It is often found that the sub-band approach delivers performance improvements (especially in the presence of noise), but not always so. This raises the question of precisely when and how sub-band processing might be advantageous, which is difficult to answer because there is as yet only a rudimentary theoretical framework guiding this work. We describe a simple sub-band speaker recognition system designed to facilitate experimentation aimed at increasing understanding of the approach. This splits the time-domain speech signal into 16 sub-bands using a bank of second-order filters spaced on the psychophysical mel scale. Each sub-band has its own separate cepstral-based recognition system, the outputs of which are combined using the sum rule to produce a final decision. We find that sub-band processing leads to worthwhile reductions in both the verification and identification error rates relative to the wideband system, decreasing the identification error rate from 3.33% to 0.56% and equal error rate for verification by approximately 50% for clean speech. The hypothesis is advanced that, unlike the wideband system, sub-band processing effectively constrains the free parameters of the speaker models to be more uniformly deployed across frequency: as such, it offers a practical solution to the bias/variance dilemma of data modeling. Much remains to be done to explore fully the new paradigm of sub-band processing. Accordingly, several avenues for future work are identified. In particular, we aim to explore the hypothesis of a practical solution to the bias/variance dilemma in more depth.

Read full abstract

Speech Signal In Time Domain Research Articles

Articles published on Speech Signal In Time Domain

An efficient speech perceptual hashing authentication algorithm based on DWT and symmetric ternary string

Frame Selection for Robust Speaker Identification: A Hybrid Approach

Estimating acoustic speech features in low signal-to-noise ratios using a statistical framework

Wavelet Speech Enhancement Based on Nonnegative Matrix Factorization

基于Labview的声卡数据采集与处理系统设计与实现

Multi-Resolution Speech Spectrogram

Searching-and-averaging method of underdetermined blind speech signal separation in time domain

Clean speech reconstruction from MFCC vectors and fundamental frequency using an integrated front-end

Improved Data Modeling for Text-Dependent Speaker Recognition Using Sub-Band Processing

Estimation of articulatory movement and its application to speech synthesis

Comparison of several speech signal feature parameters for automatic speech recognition

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Speech Signal In Time Domain Research Articles

Articles published on Speech Signal In Time Domain

An efficient speech perceptual hashing authentication algorithm based on DWT and symmetric ternary string

Frame Selection for Robust Speaker Identification: A Hybrid Approach

Estimating acoustic speech features in low signal-to-noise ratios using a statistical framework

Wavelet Speech Enhancement Based on Nonnegative Matrix Factorization

基于Labview的声卡数据采集与处理系统设计与实现

Multi-Resolution Speech Spectrogram

Searching-and-averaging method of underdetermined blind speech signal separation in time domain

Clean speech reconstruction from MFCC vectors and fundamental frequency using an integrated front-end

Improved Data Modeling for Text-Dependent Speaker Recognition Using Sub-Band Processing

Estimation of articulatory movement and its application to speech synthesis

Comparison of several speech signal feature parameters for automatic speech recognition