Abstract
The information in high-dimensional datasets is often too complex for human users to perceive directly. Hence, it may be helpful to use dimensionality reduction methods to construct lower dimensional representations that can be visualized. The natural question that arises is how do we construct a most informative low dimensional representation? We study this question from an information-theoretic perspective and introduce a new method for linear dimensionality reduction. The obtained model that quantifies the informativeness also allows us to flexibly account for prior knowledge a user may have about the data. This enables us to provide representations that are subjectively interesting. We title the method Subjectively Interesting Component Analysis (SICA) and expect it is mainly useful for iterative data mining. SICA is based on a model of a user’s belief state about the data. This belief state is used to search for surprising views. The initial state is chosen by the user (it may be empty up to the data format) and is updated automatically as the analysis progresses. We study several types of prior beliefs: if a user only knows the scale of the data, SICA yields the same cost function as Principal Component Analysis (PCA), while if a user expects the data to have outliers, we obtain a variant that we term t-PCA. Finally, scientifically more interesting variants are obtained when a user has more complicated beliefs, such as knowledge about similarities between data points. The experiments suggest that SICA enables users to find subjectively more interesting representations.
Highlights
The amount of information in high dimensional data makes it impossible to interpret such data directly
We study several types of prior beliefs: if a user only knows the scale of the data, Subjectively Interesting Component Analysis (SICA) yields the same cost function as Principal Component Analysis (PCA), while if a user expects the data to have outliers, we obtain a variant that we term t-PCA
– we present three case studies and investigate the practical advantages and drawbacks of our method, which show that it can be meaningful to account for available prior knowledge about the data (Sect. 4)
Summary
The amount of information in high dimensional data makes it impossible to interpret such data directly. The data can be analyzed in a controlled manner, by revealing particular perspectives of data (lower dimensional data representations), one at a time This is often done by means of projecting the data from the original feature space into a lower-dimensional subspace. The optimal background distribution pX is a product distribution of identical multivariate Normal distributions with mean 0 and covariance matrix σ 2I. This is summarized in the following theorem: Theorem 1 Given prior belief (10), the MaxEnt background distribution is n pX(X) = px(xi ), (12) i =1 where px(x) √1 (2π σ 2)d exp −. We show that the solution of problem (30) is a matrix normal distribution MNn×d (M, , ), : Theorem 3 The optimal solution of problem (30) is given by a matrix normal distribution:
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have