Abstract
The Karhunen–Loève transform is a well-known technique for orthonormally mapping features into an uncorrelated space. The Gaussian mixture model (GMM) with diagonal covariance matrices is a popular technique for modeling the speech feature distributions. These two techniques can be combined to improve the performance of speaker or speech recognition systems. The drawback of the combination is that both set of parameters are not optimized together. This paper presents a new model structure that integrates both orthonormal transformation and diagonal-covariance Gaussian mixture into a unified framework. All parameters of this model are obtained simultaneously by Maximum Likelihood estimation. This idea is further extended to attain a new GMM with generalized covariance matrices (GC–GMM). The traditional GMM with diagonal or full covariance matrices is a special case of the GC–GMM. The proposed method is demonstrated on a 100-person connected digit database for text independent speaker identification. In comparison with the traditional GMM, the computational complexity and the number of parameters can be greatly reduced with no degradation in system performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.