Low SNR Speech Research Articles

Speech enhancement tasks for audio with a low SNR are challenging. Existing speech enhancement methods are mainly designed for high SNR audio, and they usually use RNNs to model audio sequence features, which causes the model to be unable to learn long-distance dependencies, thus limiting its performance in low-SNR speech enhancement tasks. We design a complex transformer module with sparse attention to overcome this problem. Different from the traditional transformer model, this model is extended to effectively model complex domain sequences, using the sparse attention mask balance model's attention to long-distance and nearby relations, introducing the pre-layer positional embedding module to enhance the model's perception of position information, adding the channel attention module to enable the model to dynamically adjust the weight distribution between channels according to the input audio. The experimental results show that, in the low-SNR speech enhancement tests, our models have noticeable performance improvements in speech quality and intelligibility, respectively.

Read full abstract

Speech intelligibility in multitalker settings improves when the target speaker is spatially separated from the interfering speakers. A factor that may contribute to this improvement is the improved detectability of target-speech components due to binaural interaction in analogy to the Binaural Masking Level Difference (BMLD). This would allow listeners to hear target speech components within specific time-frequency intervals that have a negative SNR, similar to the improvement in the detectability of a tone in noise when these contain disparate interaural difference cues. To investigate whether these negative-SNR target-speech components indeed contribute to speech intelligibility, a stimulus manipulation was performed where all target components were removed when local SNRs were smaller than a certain criterion value. It can be expected that for sufficiently high criterion values target speech components will be removed that do contribute to speech intelligibility. For spatially separated speakers, assuming that a BMLD-like detection advantage contributes to intelligibility, degradation in intelligibility is expected already at criterion values below 0 dB SNR. However, for collocated speakers it is expected that higher criterion values can be applied without impairing speech intelligibility. Results show that degradation of intelligibility for separated speakers is only seen for criterion values of 0 dB and above, indicating a negligible contribution of a BMLD-like detection advantage in multitalker settings. These results show that the spatial benefit is related to a spatial separation of speech components at positive local SNRs rather than to a BMLD-like detection improvement for speech components at negative local SNRs.

Read full abstract

Low SNR Speech Research Articles

Articles published on Low SNR Speech

SNR-Progressive Model with Harmonic Compensation for Low-SNR Speech Enhancement

CST: Complex Sparse Transformer for Low-SNR Speech Enhancement.

Low SNR speech enhancement with DNN based phase estimation

Intelligibility for Binaural Speech with Discarded Low-SNR Speech Components.

Toeplitz Robust Noisy Speech Endpoint Detection

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Low SNR Speech Research Articles

Articles published on Low SNR Speech

SNR-Progressive Model with Harmonic Compensation for Low-SNR Speech Enhancement

CST: Complex Sparse Transformer for Low-SNR Speech Enhancement.

Low SNR speech enhancement with DNN based phase estimation

Intelligibility for Binaural Speech with Discarded Low-SNR Speech Components.

Toeplitz Robust Noisy Speech Endpoint Detection