Mixtures of Dirichlet-Multinomial distributions for supervised and unsupervised classification of short text data

Laura Anderlucci,Cinzia Viroli

doi:10.1007/s11634-020-00399-3

Mixtures of Dirichlet-Multinomial distributions for supervised and unsupervised classification of short text data

Laura Anderlucci, Cinzia Viroli

https://doi.org/10.1007/s11634-020-00399-3

Copy DOI

Journal: Advances in Data Analysis and Classification	Publication Date: May 25, 2020
Citations: 8

Affiliation: University of Bologna

#Mixture Of Multinomial Distributions #Unsupervised Classification + Show 8 more

Abstract
Full-Text
Similar Papers

Abstract

Topic detection in short textual data is a challenging task due to its representation as high-dimensional and extremely sparse document-term matrix. In this paper we focus on the problem of classifying textual data on the base of their (unique) topic. For unsupervised classification, a popular approach called Mixture of Unigrams consists in considering a mixture of multinomial distributions over the word counts, each component corresponding to a different topic. The multinomial distribution can be easily extended by a Dirichlet prior to the compound mixtures of Dirichlet-Multinomial distributions, which is preferable for sparse data. We propose a gradient descent estimation method for fitting the model, and investigate supervised and unsupervised classification performance on real empirical problems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: Advances in Data Analysis and Classification

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.