Abstract

In this research, we investigated on an estimation method for subjective Japanese speech intelligibility using conventional speech recognition systems. We attempted to estimate intelligibility scores of the Japanese diagnostic rhyme test (DRT), a two-to-one selection-based intelligibility test. The forced selection process was simulated with a language model that forces one of the words in the word pair in the speech recognizer. DRT words were mixed with Gaussian noise, babble (multi-speaker) noise, and pseudo-speech noise at various SNRs. The recognition ratio was compared with subjective intelligibility scores. The recognition rate of clean speech was low overall when Japanese version DRT is imitated by using the speaker-independent phoneme model. However, the recognition rate was improved by 20% by using the speaker-adapted model. The rate of deterioration from clean speech when using the speaker-adapted model was more similar to the subjective evaluation results compared to results using the speaker-independent model. However, the recognition performance is still insufficient compared to the subjectivity evaluation results. We are currently working on improvements to noise tolerance using noise adaptation. We believe this should further improve the recognition rates, bringing the overall accuracy even closer to the subjective results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call