RadioSES: mmWave-Based Audioradio Speech Enhancement and Separation System

Muhammed Zahid Ozturk,K J Ray Liu,Min Wu,Chenshu Wu,Beibei Wang

doi:10.1109/taslp.2023.3250846

Abstract

Speech enhancement and separation have been a long-standing problem, especially with the recent advances using a single microphone. Although microphones perform well in constrained settings, their performance for speech separation decreases in noisy conditions. In this work, we propose <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">RadioSES</small> , an audioradio speech enhancement and separation system that overcomes inherent problems in audio-only systems. By fusing a complementary radio modality, <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">RadioSES</small> can estimate the number of speakers, solve the source association problem, separate and enhance noisy mixture speeches, and improve both intelligibility and perceptual quality. We perform millimeter-wave sensing to detect and localize speakers and introduce an audioradio deep learning framework to fuse the separate radio features with the mixed audio features. Extensive experiments using commercial off-the-shelf devices show that <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">RadioSES</small> outperforms a variety of state-of-the-art baselines, with consistent performance gains in different environmental settings. Similar to the audiovisual methods, <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">RadioSES</small> provides significant performance improvements 3 dB gains in SiSDR, when compared with the corresponding audio-only method, along with the benefits of lower computational complexity and better privacy preservation.

Full Text