Abstract

Probabilistic latent semantic analysis (PLSA) is a popular data analysis method with the objective to discover the underlying semantic structure of input data. In this work, we describe a method for probabilistic topic analysis in image and text based on a new representation of graph-regularized PLSA (GPLSA). In GPLSA, data entities are mapped to an undirected graph, where similarities between topic compositions on the graph are measured by the divergence between discrete probabilities. Such divergence is essentially incorporated as a graph-regularizer that augments the original PLSA algorithm. Furthermore, we extend the GPLSA algorithms to multiple data modalities based on the connections between data entities of each modality. We propose efficient multiplicative iterative algorithms for GPLSA with three popular regularizers, namely ℓ1, ℓ2 and symmetric KL divergences. In each case, we derive simple efficient numerical solutions that require only matrix arithmetic operations during the optimization. Experimental results demonstrate the efficacy of GPLSA over state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call