Simplex decompositions for real-valued datasets

Madhusudana Shashanka

doi:10.1109/mlsp.2009.5306224

Abstract

In this paper, we introduce the concept of Simplex Decompositions and present a new Semi-Nonnegative decomposition technique that works with real-valued datasets. The motivation stems from the limitations of topic models such as Probabilistic Latent Semantic Analysis (PLSA), that have found wide use in the analysis of non-negative data apart from text corpora such as images, audio spectra, gene array data among others. The goal of this paper is to remove the non-negativity requirement for datasets so that these models can work on datasets with both positive and negative entries. We start by showing that PLSA is equivalent to finding a set of components that define the corners of a simplex within which all datapoints lie. We formalize this intuition by introducing the notion of Simplex Decompositions-PLSA and extensions are specific examples-and generalize the idea to be applicable to arbitrary real datasets with both positive and negative entries. We present algorithms and illustrate the method with examples.

Full Text