The Effect of Partial Time-Frequency Masking of the Direct Sound on the Perception of Reverberant Speech

Lior Madmoni,Israel Nelken,Shir Tibor,Boaz Rafaely

doi:10.1109/taslp.2021.3084742

Abstract

The perception of sound in real-life acoustic environments, such as enclosed rooms or open spaces with reflective objects, is affected by reverberation. Hence, reverberation is extensively studied in the context of auditory perception, with many studies highlighting the importance of the direct sound for perception. Based on this insight, speech processing methods often use time-frequency (TF) analysis to detect TF bins that are dominated by the direct sound, and then use the detected bins to reproduce or enhance the speech signals. The detection of bins dominated by the direct sound is typically based on an objective measure, such as the direct-to-reverberant ratio (DRR). However, the relation between the DRR in the TF bins and the spatial perception of the reverberant sound which is reproduced from these bins is still not clear. It is the aim of this paper to provide some insights into this relation, specifically for reverberant speech, focusing on bins with high DRR. This is performed using a listening experiment, where high DRR bins within a reverberant speech signal have been masked in the TF domain, based on various DRR thresholds. The results show that the percentage of high-DRR TF bins that were masked may better indicate the quality of spatial perception, compared to the specific value of the DRR threshold. The insights from this work could be incorporated into spatial audio techniques that reproduce the direct sound of reverberant speech, and potentially improve spatial perception. This was illustrated with an implementation of directional audio coding that was studied with an additional listening experiment supporting the previously described results.

Highlights

R EVERBERATION is present in many real-life acoustic scenes, in particular sound in enclosures such as rooms, offices, and auditoria
The importance for spatial perception of direct sound TF bins was studied in the context of reverberant speech
This was performed with a listening experiment of binaurally reproduced signals that were masked in the TF domain, based on direct-to-reverberant ratio (DRR) thresholds

Summary

INTRODUCTION

R EVERBERATION is present in many real-life acoustic scenes, in particular sound in enclosures such as rooms, offices, and auditoria. This paper aims to provide some insights into what can be considered to be direct sound in reverberant speech, with respect to spatial perception, in different acoustic environments. Masking is performed for bins with DRR values that are higher than a specified threshold The effect of this masking on spatial perception is evaluated using a test based on the MUltiple Stimuli with Hidden Reference and Anchor (MUSHRA) test. This is studied for four different acoustic environments with different reverberation times and different speaker-listener distances. Presented for upscaling first order Ambisonics (FOA) signals using Directional Audio Coding (DirAC) This is studied with an additional listening experiment, which supports the insights gained in the first listening experiment

MATHEMATICAL BACKGROUND

Spherical Harmonics Representation of a Direct Sound and Its Reflections

Binaural Reproduction in the Spherical Harmonics Domain

Direct-to-Reverberant Ratio

Power Threshold

DRR Threshold

Consistent Inverse STFT

VALIDITY OF THE TIME-FREQUENCY MASKING

Validity Measures for Time-Frequency Masking

Simulated Acoustic Scenes

Evaluation of the Validity Measures

LISTENING EXPERIMENT

Methodology

Results and Discussion

Extension of the Experiment - Percentage-Based Masking

APPLICATION TO DIRECTIONAL AUDIO CODING

Subjective Analysis of DirAC-Based Binaural Reproduction

Findings

CONCLUSIONS

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

The Effect of Partial Time-Frequency Masking of the Direct Sound on the Perception of Reverberant Speech

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Similar Papers

A recursive expectation-maximization algorithm for speaker tracking and separation
Ofer Schwartz ... Sharon Gannot
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2021
Ofer Schwartz, et. al.Ofer Schwartz ... Sharon Gannot
01 Dec 2021
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2021

Near-field source extraction using speech presence probabilities for ad hoc microphone arrays
Maja Taseska ... Sharon Gannot
-
Maja Taseska, et. al.Maja Taseska ... Sharon Gannot
01 Sep 2014
01 Sep 2014

Enhancement of reverberant speech in noisy acoustical environments
Marjan Joorabchi ... Ali Sarafnia
-
Marjan Joorabchi, et. al.Marjan Joorabchi ... Ali Sarafnia
01 Oct 2014
01 Oct 2014

Multiple DOA estimation based on estimation consistency and spherical harmonic multiple signal classification
Sina Hafezi ... Patrick A Naylor
-
Sina Hafezi, et. al.Sina Hafezi ... Patrick A Naylor
01 Aug 2017
01 Aug 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Effect of Partial Time-Frequency Masking of the Direct Sound on the Perception of Reverberant Speech

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing