Noisy Speech Samples Research Articles

Dynamical variational autoencoders (DVAEs) are a class of deep generative models with latent variables, dedicated to model time series of high-dimensional data. DVAEs can be considered as extensions of the variational autoencoder (VAE) that include temporal dependencies between successive observed and/or latent vectors. Previous work has shown the interest of using DVAEs over the VAE for speech spectrograms modeling. Independently, the VAE has been successfully applied to speech enhancement in noise, in an unsupervised noise-agnostic set-up that requires neither noise samples nor noisy speech samples at training time, but only requires clean speech signals. In this paper, we extend these works to DVAE-based single-channel unsupervised speech enhancement, hence exploiting both speech signals unsupervised representation learning and dynamics modeling. We propose an unsupervised speech enhancement algorithm that combines a DVAE speech prior pre-trained on clean speech signals with a noise model based on nonnegative matrix factorization, and we derive a variational expectation-maximization (VEM) algorithm to perform speech enhancement. The algorithm is presented with the most general DVAE formulation and is then applied with three specific DVAE models to illustrate the versatility of the framework. Experimental results show that the proposed DVAE-based approach outperforms its VAE-based counterpart, as well as several supervised and unsupervised noise-dependent baselines, especially when the noise type is unseen during training.

Read full abstract

Understanding the noise characteristics for finding appropriate filtering technique/s so as to obtain sufficiently clear speech samples for Speaker Identification, is one of the challenging tasks in Forensic Acoustics. Speaker's idiosyncratic speech should not be affected when the noise reduction is carried out; otherwise, Speaker Identification becomes highly erroneous. We have collected fifty noisy speech samples reported to be recorded in different modes from actual crime cases received in the laboratory. The samples are analyzed after subjecting to various filtering techniques and compared with the clear speech mixed with the noise collected from non-speech portion. Distortion levels on the speech are studied at various stages of application of filters in terms of SNR and Speaker Specific Information. Retaining the Speaker Specific Information as primary concern of our study, the limitation of filtering techniques depending on the characteristic and intensity level of noise is worked out for noisy speech samples. Subsequently a statistical study is also conducted. Listening tests were conducted to ensure that the perceptual features of the original noisy speech are preserved while applying filters. This work demonstrates the efficiency of Noise reduction filters in improving SNR and their controlled applications for preserving Speaker dependent features depending on the various noise characteristics embedded on speech samples. Audio Forensics has a challenging history of enhancement problems of speech samples received for examination. It is observed that out of the total speech samples received for Speaker Identification in the Laboratory, a large number of recordings requires enhancement. Speech is a non-linear time series represented in terms of complex number. Hence separating noise from noisy speech in spectral domain results into countless solutions. The main objective of a Noise Cancellation system is to obtain a clear signal with higher quality of speech signal. The presence of noise in speech signals can create higher degree of mismatch in performance of speech processing systems used for Speaker Identification as well as Speech Recognition. Inappropriate filtering of noise corresponds to extracting features of noise together with the actual speech signal during the feature extraction process. However, the desired parametric representation carries a high amount of error rate. The presence of broadband noise and a very low SNR deteriorate the intelligibility of most of the recorded speech samples. Speaker's idiosyncratic speech is affected when the noise reduction is carried out. Thus the Speaker Identification

Read full abstract

Noisy Speech Samples Research Articles

Related Topics

Articles published on Noisy Speech Samples

Speech enhancement augmentation for robust speech recognition in noisy environments

Non-Parallel Whisper-to-Normal Speaking Style Conversion Using Auxiliary Classifier Variational Autoencoder

Unsupervised Speech Enhancement Using Dynamical Variational Autoencoders

Noise Reduction Using Neural Lateral Inhibition for Speech Enhancement

An Improved Fully Convolutional Network Based on Post-Processing with Global Variance Equalization and Noise-Aware Training for Speech Enhancement

Examination of Energy Based Voice Activity Detection Algorithms for Noisy Speech Signals

A Signal Subspace Speech Enhancement Approach Based on Joint Low-Rank and Sparse Matrix Decomposition

English

A speech enhancement approach based on noise classification

Source and system features for phone recognition

Bayesian on-line spectral change point detection: a soft computing approach for on-line ASR

Noisy Speech Recognition Based on RBF Neural Network

Study on the Selection of Specific Filters for Enhancement of Recorded Speech for Speaker Identification

Speech enhancement employing Laplacian-Gaussian mixture

Hearing impaired speech in noisy classrooms

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Noisy Speech Samples Research Articles

Related Topics

Articles published on Noisy Speech Samples

Speech enhancement augmentation for robust speech recognition in noisy environments

Non-Parallel Whisper-to-Normal Speaking Style Conversion Using Auxiliary Classifier Variational Autoencoder

Unsupervised Speech Enhancement Using Dynamical Variational Autoencoders

Noise Reduction Using Neural Lateral Inhibition for Speech Enhancement

An Improved Fully Convolutional Network Based on Post-Processing with Global Variance Equalization and Noise-Aware Training for Speech Enhancement

Examination of Energy Based Voice Activity Detection Algorithms for Noisy Speech Signals

A Signal Subspace Speech Enhancement Approach Based on Joint Low-Rank and Sparse Matrix Decomposition

English

A speech enhancement approach based on noise classification

Source and system features for phone recognition

Bayesian on-line spectral change point detection: a soft computing approach for on-line ASR

Noisy Speech Recognition Based on RBF Neural Network

Study on the Selection of Specific Filters for Enhancement of Recorded Speech for Speaker Identification

Speech enhancement employing Laplacian-Gaussian mixture

Hearing impaired speech in noisy classrooms