Speech Source Localization Research Articles

The use of speech source localization (SSL) and its applications offer great possibilities for the design of speaker local positioning systems with wireless acoustic sensor networks (WASNs). Recent works have shown that data-driven front-ends can outperform traditional algorithms for SSL when trained to work in specific domains, depending on factors like reverberation and noise levels. However, such localization models consider localization directly from raw sensor observations, without consideration for transmission losses in WASNs. In contrast, when sensors reside in separate real-life devices, we need to quantize, encode and transmit sensor data, decreasing the performance of localization, especially when the transmission bitrate is low. In this work, we investigate the effect of low bitrate transmission on a Direction of Arrival (DoA) estimator. We analyze a deep neural network (DNN) based framework performance as a function of the audio encoding bitrate for compressed signals by employing recent communication codecs including PyAWNeS, Opus, EVS, and Lyra. Experimental results show that training the DNN on input encoded with the PyAWNeS codec at 16.4 kB/s can improve the accuracy significantly, and up to 50% of accuracy degradation at a low bitrate for almost all codecs can be recovered. Our results further show that for the best accuracy of the trained model when one of the two channels can be encoded with a bitrate higher than 32 kB/s, it is optimal to have the raw data for the second channel. However, for a lower bitrate, it is preferable to similarly encode the two channels. More importantly, for practical applications, a more generalized model trained with a randomly selected codec for each channel, shows a large accuracy gain when at least one of the two channels is encoded with PyAWNeS.

In this paper, the problem of speech source localization and separation from recordings of convolutive underdetermined mixtures is addressed. This problem is cast as recovering the spatio-spectral speech information embedded in a microphone array compressed measurements of the acoustic field. A model-based sparse component analysis framework is formulated for sparse reconstruction of the speech spectra in a reverberant acoustic resulting in joint localization and separation of the individual sources. We compare and contrast the algorithmic approaches to model-based sparse recovery exploiting spatial sparsity as well as spectral structures underlying spectrographic representation of speech signals. In this context, we explore identification of the sparsity structures at the auditory and acoustic representation spaces. The audiory structures are formulated upon the principles of structural grouping based on proximity, autoregressive correlation and harmonicity of the spectral coefficients and they are incoporated for sparse reconstruction. The acoustic structures are formulated upon the image model of multipath propagation and they are exploited to characterize the compressive measurement matrix associated with microphone array recordings.Three approaches to sparse recovery relying on combinatorial optimization, convex relaxation and sparse Bayesian learning are studied and evaluated on thorough experiments. The sparse Bayesian learning method is shown to yield better perception quality while the interference suppression is also achieved using the combinatorial approach with the advantage of offering the most efficient computational cost. Furthermore, it is demonstrated that an average autoregressive model can be learned for speech localization while exploiting the proximity structure in the form of block sparse coefficients enables accurate localization and high quality speech separation. Throughout the extensive empirical evaluation, we confirm that a large and random placement of the microphones enables significant improvement in source localization and separation performance.

Speech Source Localization Research Articles

Related Topics

Articles published on Speech Source Localization

Method and practice of microphone array speech source localization based on sound propagation modeling

Deep Learning-Based Speech Specific Source Localization by Using Binaural and Monaural Microphone Arrays in Hearing Aids

Speech Localization at Low Bitrates in Wireless Acoustics Sensor Networks

An Adaptive Method Based on Multiscale Dilated Convolutional Network for Binaural Speech Source Localization

Spectral Flux-Based Convolutional Neural Network Architecture for Speech Source Localization and Its Real-Time Implementation.

Multiple Speech Sources Localization in Room Reverberant Environment Using Spherical Harmonic Sparse Bayesian Learning

Real-Time Convolutional Neural Network-Based Speech Source Localization on Smartphone

Correction to: Non-Uniform Microphone Arrays for Robust Speech Source Localization for Smartphone-Assisted Hearing Aid Devices

Non-Uniform Microphone Arrays for Robust Speech Source Localization for Smartphone-Assisted Hearing Aid Devices.

Window-Dominant Signal Subspace Methods for Multiple Short-Term Speech Source Localization

Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion.

Binary Sparse Coding of Convolutive Mixtures for Sound Localization and Separation via Spatialization

Computational methods for underdetermined convolutive speech localization and separation via model-based sparse component analysis

Robust and Fast Localization of Single Speech Source Using a Planar Array

A Speech Recognizing Circuit for Home Service Robot

A two-microphone method for localization of multiple speech sources using complex exponential transform of phase differences

Evaluation of generalized cross-correlation methods for direction of arrival estimation using two microphones in real environments

A Novel Multiple Sparse Source Localization Using Triangular Pyramid Microphone Array

Source localization for multiple speech sources using low complexity non-parametric source separation and clustering

Performance Improvement of TDOA-Based Speaker Localization in Joint Noisy and Reverberant Conditions

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Speech Source Localization Research Articles

Related Topics

Articles published on Speech Source Localization

Method and practice of microphone array speech source localization based on sound propagation modeling

Deep Learning-Based Speech Specific Source Localization by Using Binaural and Monaural Microphone Arrays in Hearing Aids

Speech Localization at Low Bitrates in Wireless Acoustics Sensor Networks

An Adaptive Method Based on Multiscale Dilated Convolutional Network for Binaural Speech Source Localization

Spectral Flux-Based Convolutional Neural Network Architecture for Speech Source Localization and Its Real-Time Implementation.

Multiple Speech Sources Localization in Room Reverberant Environment Using Spherical Harmonic Sparse Bayesian Learning

Real-Time Convolutional Neural Network-Based Speech Source Localization on Smartphone

Correction to: Non-Uniform Microphone Arrays for Robust Speech Source Localization for Smartphone-Assisted Hearing Aid Devices

Non-Uniform Microphone Arrays for Robust Speech Source Localization for Smartphone-Assisted Hearing Aid Devices.

Window-Dominant Signal Subspace Methods for Multiple Short-Term Speech Source Localization

Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion.

Binary Sparse Coding of Convolutive Mixtures for Sound Localization and Separation via Spatialization

Computational methods for underdetermined convolutive speech localization and separation via model-based sparse component analysis

Robust and Fast Localization of Single Speech Source Using a Planar Array

A Speech Recognizing Circuit for Home Service Robot

A two-microphone method for localization of multiple speech sources using complex exponential transform of phase differences

Evaluation of generalized cross-correlation methods for direction of arrival estimation using two microphones in real environments

A Novel Multiple Sparse Source Localization Using Triangular Pyramid Microphone Array

Source localization for multiple speech sources using low complexity non-parametric source separation and clustering

Performance Improvement of TDOA-Based Speaker Localization in Joint Noisy and Reverberant Conditions