V-Speech

Héctor A Cordourier Maruri,Lama Nachman,Jonathan Huang,Paulo Lopez-Meyer,Hong Lu,Willem Marco Beltman

doi:10.1145/3287058

Abstract

Smart glasses are often used in public environments or industrial scenarios that are relatively noisy. Background noise and sound from competing speakers deteriorate voice communication or performance of automatic speech recognition (ASR). Typically, signal processing techniques are used to reduce noise and enhance voice quality, but they have limitations in performance, hardware and/or computing resources. Voice capturing techniques using bone conducting on the head have been proposed in some experimental and commercial devices, with good robustness against environmental noise, but limited by signal distortions inherent to the capturing method. We present V-Speech, a novel sensing and signal processing solution that enables speech recognition and human-to-human communication in very noisy environments. It captures the voice signal with a vibration sensor located in the nasal pads of smart glasses and performs a transformation to the sensor signal in order to mimic that of a regular microphone in low noise conditions. The signal transformation is key, as it eliminates the "nasal distortion" that is introduced for nasal phonemes in the speech induced vibrations of the nasal bone. The output of V-Speech has low noise, sounds natural, and can be used in voice communication or as input to an off-the-shelf ASR service. We evaluated V-Speech in noise-free and noisy conditions with 30 volunteer speakers uttering 145 phrases and validated its performance on ASR engines and with assessments of voice quality using the Perceptual Evaluation of Speech Quality (PESQ) metric. The results show in extreme noise conditions a mean improvement of 50% for Word Error Rate (WER), and 1.0 on a scale of 5.0 for PESQ. In addition, real life recordings were made under various representative noise conditions, some with sound pressure levels of 93 dBA, which require hearing protection. Subjective listening tests were conducted according to a modified ITU P.835 approach to determine intelligibility, naturalness and overall quality. Under these extreme conditions, where V-Speech achieved 30 dB SNR, subjective results show the speech is intelligible, and the naturalness of the speech is rated as fair to good. This enables clear voice communication in challenging work environments, for example in places with industrial, factory, mining and construction noise. With our proposed smart switching technique between a regular microphone signal and V-Speech, the optimal quality can be maintained from low to high noise conditions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

V-Speech

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Lead the way for us

Journal: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies	Publication Date: Dec 27, 2018
Citations: 23

Similar Papers

V-Speech
Héctor A Cordourier Maruri ... Paulo Lopez-Meyer
GetMobile: Mobile Computing and Communications | VOL. 24
Héctor A Cordourier Maruri, et. al.Héctor A Cordourier Maruri ... Paulo Lopez-Meyer
29 Sep 2020
GetMobile: Mobile Computing and Communications | VOL. 24

"Mm-hm," "Uh-uh": are non-lexical conversational sounds deal breakers for the ambient clinical documentation technology?
Brian D Tran ... Jennifer Elston Lafata
Journal of the American Medical Informatics Association | VOL. 30
Brian D Tran, et. al.Brian D Tran ... Jennifer Elston Lafata
23 Jan 2023
Journal of the American Medical Informatics Association | VOL. 30

A cross-language study of speech recognition systems for English, German, and Hebrew
Vered Silber Varod ... Oliver Jokisch
Online Journal of Applied Knowledge Management | VOL. 9
Vered Silber Varod, et. al.Vered Silber Varod ... Oliver Jokisch
26 Jul 2021
Online Journal of Applied Knowledge Management | VOL. 9

Consensus Automatic Speech Recognition (CASR) in Cognitive Testing
Timothy J Herron ... Kathleen Hall
Alzheimer's & Dementia | VOL. 18
Timothy J Herron, et. al.Timothy J Herron ... Kathleen Hall
01 Dec 2022
Alzheimer's & Dementia | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

V-Speech

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies