Abstract

This study investigates the use of non-conventional body-conductive acoustic sensors in human-human speech communication and automatic speech recognition. The body-conductive sensors are directly attached to the speaker and receive the uttered speech through the skin and bones, resulting in higher robustness against environmental noise. In this study, a throat microphone, an ear bone microphone, and a standard microphone were evaluated using subjective speech intelligibility tests and automatic speech recognition experiments. In addition to the use of these sensors on their own, several methods were also applied for sensor integration, thereby achieving higher recognition rates. Namely, multi-stream hidden Markov model (HMM) decision fusion, and late fusion methods were used to integrate several sensors. By using late fusion, a 40% relative recognition rate improvement in a noisy environment, and a 24% relative recognition rate improvement in a clean environment were achieved. In the case of late fusion, a novel adaptive weighting method was introduced that does not require any pre-adjustment of the weights. In this study, a technique to automatically segment noisy speech data by using a body-conductive sensor in conjunction with the desired microphone during recording is presented. The Lombard effect phenomenon when using body-conductive acoustic sensors was also investigated.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call