Abstract

Throat microphones are more robust to environmental noises than usual acoustic microphones such as close-talk microphones because they detect speech signals through skin vibrations rather than by air transmission. Throat microphones, however, cannot be used in conventional speech recognition systems because their acoustic characteristics are much different from those of the acoustic microphones. In this study, we propose a deep neural network (DNN)-based feature mapping method for throat microphone speech recognition. To utilize a large amount of training data recorded by acoustic microphones and effectively reduce the acoustic mismatch between the throat and acoustic microphones, we tried to use the bottleneck features to mediate between them. Evaluation results for a large-vocabulary speech recognition task of Japanese free conversation revealed that the proposed system had a 45.8% lower character error rate (75.5% → 40.9%) than the typical MFCC system trained from the acoustic microphone data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call