Noisy Audio Signals Research Articles

The article formulates the Dictionary Recognition problem, which is relevant for a wide range of applied problems: word recognition in a noisy audio signal for natural language processing tasks or in a noisy electromagnetic signal, recognition of visual patterns in limited visibility, and much more. A Dictionary Recognition problem is finding a set of words from a given set to maximize the classification accuracy of the words in the dictionary without losing semantic representation. The idea of solving the problem is to represent a set of objects (encoded as a sequence of symbols or visual sequences) in the form of a k-partite graph, where each partite of the graph corresponds to a group of objects with a certain common feature (equivalence class). The task is to find such a set of representatives of the k equivalence classes on which the k-classification accuracy by the classifier H meets certain criteria: (1) maximum classification accuracy; (2) maximin accuracy—the binary classification accuracy of every two objects is not lower than a certain value. The proposed Maximin Algorithm provides k-partite cliques with a maximin worst-case classification accuracy and belongs to the P-class. The Maximal Algorithm provides k-partite cliques with the maximum total weight (the problem belongs to the NP-hard class). The presented algorithms select a set of representatives optimally in terms of classification accuracy for the certain classifier and runtime. The algorithms increase classification accuracy when using classical classification methods without additional optimization of the classifiers themselves. We tested the algorithms on simulated data and provide an open-source project on GitHub. The results of the Maximin and Maximal Algorithms give 4-, 8- and 16-classification accuracy close to the best accuracy (obtained by brute-force enumeration) and better than the median accuracy by more than 20% for the support vector machine classifiers. Furthermore, the algorithms increase the selection speed of representatives by five orders of magnitude compared to the brute-force algorithm with a slight loss of accuracy.

Voice interactions and voice messages on mobile phones are rapidly growing in popularity. However, the user experience of these services is still worse than desired in noisy environments, especially in multi-talker scenarios, where the phone can only provide low-quality voice recordings. Speech enhancement using only audio as the input remains a grand challenge in these scenarios. In this paper, we handle this with the help of the emerging acoustic sensing technology. The key insight is that the inaudible acoustic signals emitted by speakers of phones can capture the subtle lip movements when people speak. Instead of enabling lip reading for the classification of limited voice commands, we further unlock the potential of acoustic sensing and leverage the captured lip information to improve the voice recording quality. We propose WaveVoice, a joint audio-sensory deep learning method for end-to-end speech enhancement on mobile phones. The model of WaveVoice is structured as an encoder-decoder network, in which audio and acoustic sensing data are processed through two individual CNN branches, respectively, and then fused into a joint network to generate enhanced speech. In addition, to improve the performance on new users, a self-supervised learning methodology is developed to adapt the model to extract speaker-specific features. We construct a dataset to train and evaluate WaveVoice. We also perform online tests under various noisy conditions to show the applicability of our system in real-world scenarios. Experimental results show that WaveVoice can effectively reconstruct the target clean speech from the noisy audio signals, and yield notably superior performance compared with the audio-only encoder-decoder model and the state-of-the-art speech enhancement methods. Given its promising performance, we believe that WaveVoice has made a substantial contribution to the advancement of mobile voice input.

Noisy Audio Signals Research Articles

Related Topics

Articles published on Noisy Audio Signals

Towards image-based laryngeal videostroboscopy using deep learning-enabled compressed sensing

Self-supervised speech denoising using only noisy audio signals

The Algorithm That Maximizes the Accuracy of k-Classification on the Set of Representatives of the k Equivalence Classes

Sensing to Hear

Denoising Speech Based on Deep Learning and Wavelet Decomposition

Robust North Atlantic right whale detection using deep learning models for denoising.

Robust Audio Content Classification Using Hybrid-Based SMD and Entropy-Based VAD.

A hybrid technique for speech segregation and classification using a sophisticated deep neural network.

Audio-cough event detection based on moment theory

Application of the dual-tree wavelet transform for digital filtering of noisy audio signals

Parametric Optimization and Analysis of Adaptive Equalization Algorithms for Noisy Speech Signals

FDMSM robust signal representation for speech mixtures and noise corrupted audio signals

An Overview of Bayesian Computational methods for audio signal processing

잡음 패턴의 지능적 추정을 통한 음질 개선 알고리즘

Methods for reducing audible artifacts in a wavelet-based broad-band denoising system

Audio peripheral mixer circuit and method for noise reduction

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Noisy Audio Signals Research Articles

Related Topics

Articles published on Noisy Audio Signals

Towards image-based laryngeal videostroboscopy using deep learning-enabled compressed sensing

Self-supervised speech denoising using only noisy audio signals

The Algorithm That Maximizes the Accuracy of k-Classification on the Set of Representatives of the k Equivalence Classes

Sensing to Hear

Denoising Speech Based on Deep Learning and Wavelet Decomposition

Robust North Atlantic right whale detection using deep learning models for denoising.

Robust Audio Content Classification Using Hybrid-Based SMD and Entropy-Based VAD.

A hybrid technique for speech segregation and classification using a sophisticated deep neural network.

Audio-cough event detection based on moment theory

Application of the dual-tree wavelet transform for digital filtering of noisy audio signals

Parametric Optimization and Analysis of Adaptive Equalization Algorithms for Noisy Speech Signals

FDMSM robust signal representation for speech mixtures and noise corrupted audio signals

An Overview of Bayesian Computational methods for audio signal processing

잡음 패턴의 지능적 추정을 통한 음질 개선 알고리즘

Methods for reducing audible artifacts in a wavelet-based broad-band denoising system

Audio peripheral mixer circuit and method for noise reduction