Abstract

In this paper, we introduce the concept of Simplex Decompositions and present a new Semi-Nonnegative decomposition technique that works with real-valued datasets. The motivation stems from the limitations of topic models such as Probabilistic Latent Semantic Analysis (PLSA), that have found wide use in the analysis of non-negative data apart from text corpora such as images, audio spectra, gene array data among others. The goal of this paper is to remove the non-negativity requirement for datasets so that these models can work on datasets with both positive and negative entries. We start by showing that PLSA is equivalent to finding a set of components that define the corners of a simplex within which all datapoints lie. We formalize this intuition by introducing the notion of Simplex Decompositions-PLSA and extensions are specific examples-and generalize the idea to be applicable to arbitrary real datasets with both positive and negative entries. We present algorithms and illustrate the method with examples.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call