Abstract

We present a tensor analysis of acoustic models comprising various speakers in multiple noise conditions, and its application to the new speaker and environment adaptation for speech recognition. The bases used in adaptation are constructed by decomposing the training models in the state, feature dimension, speaker, and noise spaces using multilinear singular value decomposition. The isolated-word recognition experiment demonstrated the effectiveness of the proposed method, showing better performance than eigenvoice in the babble and factory floor noises for the adaptation data longer than approximately 20 s.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call