Binaural source separation using auditory attention for salient and non-salient sounds

Masoud Geravanchizadeh,Sahar Zakeri

doi:10.1016/j.apacoust.2022.108822

Abstract

Speech perception in cocktail party scenarios is a concern of a group of researchers who try to modify the design of hearing-aid devices. In this paper, a new unified binaural source separation system is introduced for hearing-impaired (HI) listeners when they are in the real environment. The proposed model has three main procedures based on the essence of input sound which could be processed using selective auditory attention or involuntary attention. In the beginning, the input frame of the mixture is preprocessed based on sound pressure level (SPL) to detect salient or non-salient frames. If the detected frame is determined as speech, the attended speech is separated through a binaural speech separation (BSS) procedure using auditory selective attention detection obtained from electroencephalography (EEG) analysis. However, if the detected frame is specified as a salient event, it is sent to the binaural salient event separation (BSES) procedure to separate saliency sound from a background using sound features. Finally, the separated sound is fed to the insertion gain computation procedure to amplify the attended sound based on the individual hearing thresholds of the listener. The results of the BSES system show high signal-to-noise ratio improvement for different salient event classes using various extracted features. The systematic evaluation of the proposed system yields substantial intelligibility and quality enhancements for all subjects. The unified binaural source separation system could be adjusted for different acoustic environments based on frame detection. The present model has the potential to be considered in smart neuro-steered hearing aids where the listener is confronted with different sounds and can pay attention to a certain sound voluntarily or involuntarily.

Full Text