On pre-image iterations for speech enhancement.

Christina Leitner,Franz Pernkopf

doi:10.1186/s40064-015-0983-x

Abstract

In this paper, we apply kernel PCA for speech enhancement and derive pre-image iterations for speech enhancement. Both methods make use of a Gaussian kernel. The kernel variance serves as tuning parameter that has to be adapted according to the SNR and the desired degree of de-noising. We develop a method to derive a suitable value for the kernel variance from a noise estimate to adapt pre-image iterations to arbitrary SNRs. In experiments, we compare the performance of kernel PCA and pre-image iterations in terms of objective speech quality measures and automatic speech recognition. The speech data is corrupted by white and colored noise at 0, 5, 10, and 15 dB SNR. As a benchmark, we provide results of the generalized subspace method, of spectral subtraction, and of the minimum mean-square error log-spectral amplitude estimator. In terms of the scores of the PEASS (Perceptual Evaluation Methods for Audio Source Separation) toolbox, the proposed methods achieve a similar performance as the reference methods. The speech recognition experiments show that the utterances processed by pre-image iterations achieve a consistently better word recognition accuracy than the unprocessed noisy utterances and than the utterances processed by the generalized subspace method.

Highlights

Speech enhancement is important in the field of speech communications and speech recognition
6.1 Experiment 1: Kernel principal component analysis (PCA), pre-image iterations (PI) with SNR-dependent kernel variance, and PI with heuristic determination of the kernel variance Figure 6 and Figure 7 show the results of kernel PCA with normalized iterative pre-image computation as given in Equation (24), of PI with SNR-dependent setting of the kernel variance (PIcSNR), and of PI with heuristic determination of the kernel variance (PID)
For kPCA and PIcSNR, the choice of a suitable value for the kernel variance and the regularization parameter η is based on the performance in terms of the perceptual evaluation of audio source separation (PEASS) scores on the development set

Summary

Introduction

Speech enhancement is important in the field of speech communications and speech recognition. Kernel methods transform data samples by mapping them from the input space to the so-called feature space. We call this pre-image iterations (PI) for speech enhancement, as the reconstructed sample in input space is called pre-image Besides their relation to subspace methods, PI exhibit a similarity to non-local neighborhood filtering (NF) applied for image de-noising (Buades et al 2005; Singer et al 2009). We use an automatic speech recognition (ASR) system to measure the performance of noise contaminated and subsequently enhanced data. If the principal components of variables are non-linearly related to the input variables, a non-linear feature extractor is more suitable This is realized by kernel PCA (Mika et al 1999; Schölkopf and Smola 2002). To project x onto the eigenvectors vk in F the following steps are required: (i) compute the kernel matrix K, (ii) compute its eigenvectors αk and normalize them using (13) and (14), (iii) project the data sample x using (15)

Centering

Objective quality measures For objective evaluation we use two measures

Results and discussion

Experiment 1

Experiment 2

Experiment 3

Experiment 4

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: SpringerPlus	Publication Date: Jun 4, 2015
Citations: 21	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

On pre-image iterations for speech enhancement.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: SpringerPlus

Lead the way for us

Similar Papers

Speech intelligibility improvement using convolutive blind source separation assisted by denoising algorithms
Jedrzej Kocinski
Speech Communication | VOL. 50
Jedrzej KocinskiJedrzej Kocinski
29 Jun 2007
Speech Communication | VOL. 50

Incorporating a Psychoacoustical Model in Frequency Domain Speech Enhancement
Y Hu ... P.C Loizou
IEEE Signal Processing Letters | VOL. 11
Y Hu, et. al.Y Hu ... P.C Loizou
01 Feb 2004
IEEE Signal Processing Letters | VOL. 11

Speech Processing System Using a Noise Reduction Neural Network Based on FFT Spectrums
Jae-Seung Choi
Journal of information and communication convergence engineering | VOL. 10
Jae-Seung ChoiJae-Seung Choi
30 Jun 2012
Journal of information and communication convergence engineering | VOL. 10

Combining Speech Enhancement with Feature Post-processing for Robust Speech Recognition
Jianjun Lei ... Jian Wang
-
Jianjun Lei, et. al.Jianjun Lei ... Jian Wang
01 Jan 2006
01 Jan 2006

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On pre-image iterations for speech enhancement.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: SpringerPlus