Abstract
Earphones have become a popular voice input and interaction device. However, airborne speech is susceptible to ambient noise, making it necessary to improve the quality and intelligibility of speech on earphones in noisy conditions. As the dual-microphone structure (i.e., outer and in-ear microphones) has been widely adopted in earphones (especially ANC earphones), we design EarSpeech which exploits in-ear acoustic sensory as the complementary modality to enable airborne speech enhancement. The key idea of EarSpeech is that in-ear speech is less sensitive to ambient noise and exhibits a correlation with airborne speech. However, due to the occlusion effect, in-ear speech has limited bandwidth, making it challenging to directly correlate with full-band airborne speech. Therefore, we exploit the occlusion effect to carry out theoretical modeling and quantitative analysis of this cross-channel correlation and study how to leverage such cross-channel correlation for speech enhancement. Specifically, we design a series of methodologies including data augmentation, deep learning-based fusion, and noise mixture scheme, to improve the generalization, effectiveness, and robustness of EarSpeech, respectively. Lastly, we conduct real-world experiments to evaluate the performance of our system. Specifically, EarSpeech achieves an average improvement ratio of 27.23% and 13.92% in terms of PESQ and STOI, respectively, and significantly improves SI-SDR by 8.91 dB. Benefiting from data augmentation, EarSpeech can achieve comparable performance with a small-scale dataset that is 40 times less than the original dataset. In addition, we validate the generalization of different users, speech content, and language types, respectively, as well as robustness in the real world via comprehensive experiments. The audio demo of EarSpeech is available on https://github.com/EarSpeech/earspeech.github.io/.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.