Abstract

The multinomial distribution and the Dirichlet Compound Multinomial (DCM) are widely accepted to model count data. However, recent research showed that the Dirichlet is not the best choice as a prior to multinomial. We propose a novel model called the Multinomial Scaled Dirichlet (MSD) distribution that is the composition of the scaled Dirichlet distribution and the multinomial. Moreover, to improve the computation efficiency in high-dimensional spaces, we propose to approximate the MSD as a member of the exponential family. The performance evaluation of the proposed models is conducted through a set of extensive empirical experiments on challenging applications, namely, text classification, facial expression recognition, and texture images clustering. The results show that the proposed model, and its approximation, strive to achieve higher accuracy compared to the state-of-the-art generative models for count data clustering, while the approximation EMSD is many times faster than the corresponding MSD.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call