Abstract
In this paper, we apply kernel PCA for speech enhancement and derive pre-image iterations for speech enhancement. Both methods make use of a Gaussian kernel. The kernel variance serves as tuning parameter that has to be adapted according to the SNR and the desired degree of de-noising. We develop a method to derive a suitable value for the kernel variance from a noise estimate to adapt pre-image iterations to arbitrary SNRs. In experiments, we compare the performance of kernel PCA and pre-image iterations in terms of objective speech quality measures and automatic speech recognition. The speech data is corrupted by white and colored noise at 0, 5, 10, and 15 dB SNR. As a benchmark, we provide results of the generalized subspace method, of spectral subtraction, and of the minimum mean-square error log-spectral amplitude estimator. In terms of the scores of the PEASS (Perceptual Evaluation Methods for Audio Source Separation) toolbox, the proposed methods achieve a similar performance as the reference methods. The speech recognition experiments show that the utterances processed by pre-image iterations achieve a consistently better word recognition accuracy than the unprocessed noisy utterances and than the utterances processed by the generalized subspace method.
Highlights
Speech enhancement is important in the field of speech communications and speech recognition
6.1 Experiment 1: Kernel principal component analysis (PCA), pre-image iterations (PI) with SNR-dependent kernel variance, and PI with heuristic determination of the kernel variance Figure 6 and Figure 7 show the results of kernel PCA with normalized iterative pre-image computation as given in Equation (24), of PI with SNR-dependent setting of the kernel variance (PIcSNR), and of PI with heuristic determination of the kernel variance (PID)
For kPCA and PIcSNR, the choice of a suitable value for the kernel variance and the regularization parameter η is based on the performance in terms of the perceptual evaluation of audio source separation (PEASS) scores on the development set
Summary
Speech enhancement is important in the field of speech communications and speech recognition. Kernel methods transform data samples by mapping them from the input space to the so-called feature space. We call this pre-image iterations (PI) for speech enhancement, as the reconstructed sample in input space is called pre-image Besides their relation to subspace methods, PI exhibit a similarity to non-local neighborhood filtering (NF) applied for image de-noising (Buades et al 2005; Singer et al 2009). We use an automatic speech recognition (ASR) system to measure the performance of noise contaminated and subsequently enhanced data. If the principal components of variables are non-linearly related to the input variables, a non-linear feature extractor is more suitable This is realized by kernel PCA (Mika et al 1999; Schölkopf and Smola 2002). To project x onto the eigenvectors vk in F the following steps are required: (i) compute the kernel matrix K, (ii) compute its eigenvectors αk and normalize them using (13) and (14), (iii) project the data sample x using (15)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.