Abstract

Gaussian Mixture Model (GMM) has been widely used in speech signal and image signal classification tasks. It can be directly used as a classifier, or used as the representation of speech or image signals. Another important usage of GMM is to serve as the Universal Background Model (UBM) to generate speech representations such as Gaussian Supervector (GSV) and i-vector. In this paper, we borrow GSV from speech signal classification studies and apply it as an image representation for image classification. GSV is calculated based on a Universal Background Model (UBM). Apart from employing the conventional GMM as the UBM to calculate GSV, we also propose the Equal-Variance GMM (EV-GMM), where all the variables in all the Gaussian mixture components share the same variance. Moreover, we derive the kernel version of EV-GMM, which generalizes EV-GMM by introducing a kernel. We then compare GSV to the raw image feature and other popular image representations such as Sparse Representation (SR) and Collaborative Representation (CR). Experiments are carried out on a handwritten digit recognition task, and classification results indicate that GSV can work very well and can be even better than other popular image representations. In addition, as the UBM, the proposed EV-GMM can work better than the conventional GMM.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call