Abstract

Pre-hospital emergency medical service (EMS) tasks often come with complex and diverse noise interferences, posing challenges in implementing ASR-based medical technologies and hindering efficient and accurate telephonic communication. Among the different types of noise distortion, interfering speech is especially annoying. To address these issues, our aim is to develop a technology capable of extracting the intended speech content of the target physician from noisy and mixed audio during EMS tasks. In this work, we propose a monoaural personalized speech enhancement (PSE) method called pDenoiser, which is a real-time neural network that operates in the time domain. By leveraging the prior vocalization cues of emergency physicians, pDenoiser selectively enhances target speech components while suppressing noise and nontarget speech components, thereby improving speech quality and speech recognition accuracy under noisy conditions. We demonstrate the potential value of our approach through evaluations on both public general-domain test sets and our self-collected real-world EMS test sets. The experimental results are promising, as our model effectively promotes both speech quality and ASR performance under various conditions and outperforms related methods across multiple evaluation metrics. Our methodology will hopefully elevate EMS efficiency and fortify security against nontarget speech during EMS tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.