Abstract
Users require rapid and highly accurate speech recognition systems. Accuracy could be improved by unsupervised adaptation as provided by CMLLR (Constrained Maximum Likelihood Linear Regression). CMLLR-based batch-type unsupervised adaptation estimates a single global transformation matrix by utilizing unsupervised labeling; unfortunately, it needs prior labeling and so is not rapid. Our proposed technique reduces the prior labeling time by using context independent phoneme models (monophones) and frame-by-frame statistics accumulation in unsupervised adaptation. The proposed technique further raises the accuracy by accumulating statistics with power and performing recognition with power after adaptation. Simulations using spontaneous speech show that the proposed technique reduced the total computational time of labeling and recognition by 52.2% while matching the recognition rate of the conventional unsupervised adaptation technique that uses context dependent phoneme models (triphones) statistics accumulation.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have