Abstract

We propose a generative acoustic model training method for robust speech recognition with blind sound source separation as a front-end. Multiple microphone systems are often used for the separation. In such situation, separated speech is severely distorted and thus the recognition rate significantly drops. If we can measure transmission characteristics from the sound sources with various directions to the microphones, we can simulate to receive various mixed speech made by multiple speakers speaking with overlaps to each other. Then we separate the simulated overlapped speech using a blind source separation method such as frequency domain independent component analysis (FDICA) and use the separated speech to train HMM acoustic models to recognize such separated speech. Our method can generate such distorted speech enormously without recording the real speech spoken to the microphone system. We evaluate the models in the continuous Japanese speech recognition and show the effectiveness.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call