Abstract

We propose an alternative to the generative classifier that usually models both the class conditionals and class priors separately, and then uses the Bayes theorem to compute the posterior distribution of classes given the training set as a decision boundary. Because SVM (support vector machine) is not a probabilistic framework, it is really difficult to implement a direct posterior distribution-based discriminative classifier. As SVM lacks in full Bayesian analysis, we propose a hybrid (generative–discriminative) technique where the generative topic features from a Bayesian learning are fed to the SVM. The standard latent Dirichlet allocation topic model with its Dirichlet (Dir) prior could be defined as Dir–Dir topic model to characterize the Dirichlet placed on the document and corpus parameters. With very flexible conjugate priors to the multinomials such as generalized Dirichlet (GD) and Beta-Liouville (BL) in our proposed approach, we define two new topic models: the BL–GD and GD–BL. We take advantage of the geometric interpretation of our generative topic (latent) models that associate a K-dimensional manifold (K is the size of the topics) embedded into a V-dimensional feature space (word simplex) where V is the vocabulary size. Under this structure, the low-dimensional topic simplex (the subspace) defines a document as a single point on its manifold and associates each document with a single probability. The SVM, with its kernel trick, performs on these document probabilities in classification where it utilizes the maximum margin learning approach as a decision boundary. The key note is that points or documents that are close to each other on the manifold must belong to the same class. Experimental results with text documents and images show the merits of the proposed framework.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call