Abstract

An acoustic model for an embedded speech recognition system must exhibit two desirable features; the ability to minimize the performance degradation in recognition, while solving the memory problem under the constraint of limited system resources. Moreover, for general speech recognition tasks, context dependent models such as state-clustered tri-phones are used to guarantee the high recognition performance of the embedded system. To cope with these challenges, we introduce the state-clustered tied-mixture (SCTM) HMM as a method of optimizing an acoustic model. The proposed SCTM modeling system offers a significant improvement in recognition performance, as well as providing a solution to sparse training data problems. Moreover, the state weight quantizing method achieves a drastic reduction in the size of the model. However, using models constructed only in this way is insufficient to improve the recognition rate in some tasks where a large mutual similarity exists, such as in the case of the Korean-digit recognition task. Hence, we also construct new dedicated HMM’s for all or part of the Korean-digits that have exclusive states using the same Gaussian pool of previous tri-phone models. In this paper, we describe the acoustic model optimization procedure for embedded speech recognition systems and the corresponding performance evaluation results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call