SICA: subjectively interesting component analysis

Bo Kang,Raúl Santos-Rodríguez,Jefrey Lijffijt,Tijl De Bie

doi:10.1007/s10618-018-0558-x

Bo Kang, Raúl Santos-Rodríguez + Show 2 more

Open Access

PDF Available

https://doi.org/10.1007/s10618-018-0558-x

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

The information in high-dimensional datasets is often too complex for human users to perceive directly. Hence, it may be helpful to use dimensionality reduction methods to construct lower dimensional representations that can be visualized. The natural question that arises is how do we construct a most informative low dimensional representation? We study this question from an information-theoretic perspective and introduce a new method for linear dimensionality reduction. The obtained model that quantifies the informativeness also allows us to flexibly account for prior knowledge a user may have about the data. This enables us to provide representations that are subjectively interesting. We title the method Subjectively Interesting Component Analysis (SICA) and expect it is mainly useful for iterative data mining. SICA is based on a model of a user’s belief state about the data. This belief state is used to search for surprising views. The initial state is chosen by the user (it may be empty up to the data format) and is updated automatically as the analysis progresses. We study several types of prior beliefs: if a user only knows the scale of the data, SICA yields the same cost function as Principal Component Analysis (PCA), while if a user expects the data to have outliers, we obtain a variant that we term t-PCA. Finally, scientifically more interesting variants are obtained when a user has more complicated beliefs, such as knowledge about similarities between data points. The experiments suggest that SICA enables users to find subjectively more interesting representations.

Highlights

The amount of information in high dimensional data makes it impossible to interpret such data directly
We study several types of prior beliefs: if a user only knows the scale of the data, Subjectively Interesting Component Analysis (SICA) yields the same cost function as Principal Component Analysis (PCA), while if a user expects the data to have outliers, we obtain a variant that we term t-PCA
– we present three case studies and investigate the practical advantages and drawbacks of our method, which show that it can be meaningful to account for available prior knowledge about the data (Sect. 4)

Summary

Introduction

The amount of information in high dimensional data makes it impossible to interpret such data directly. The data can be analyzed in a controlled manner, by revealing particular perspectives of data (lower dimensional data representations), one at a time This is often done by means of projecting the data from the original feature space into a lower-dimensional subspace. The optimal background distribution pX is a product distribution of identical multivariate Normal distributions with mean 0 and covariance matrix σ 2I. This is summarized in the following theorem: Theorem 1 Given prior belief (10), the MaxEnt background distribution is n pX(X) = px(xi ), (12) i =1 where px(x) √1 (2π σ 2)d exp −. We show that the solution of problem (30) is a matrix normal distribution MNn×d (M, , ), : Theorem 3 The optimal solution of problem (30) is given by a matrix normal distribution:

Objectives

Methods

Results

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Data Mining and Knowledge Discovery	Publication Date: Mar 8, 2018
Citations: 4	License type: open-access

R Discovery Prime

SICA: subjectively interesting component analysis

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Data Mining and Knowledge Discovery

Lead the way for us

Similar Papers

A critical study of different dimensionality reduction methods for gear crack degradation assessment under different operating conditions
Xiang Wan ... Qing Zhang
Measurement | VOL. 78
Xiang Wan, et. al.Xiang Wan ... Qing Zhang
22 Oct 2015
Measurement | VOL. 78

Comparative study of different dimensionality reduction methods in hyperspectral image classification
Lei Kang ... Kai Zhang
Journal of Physics: Conference Series | VOL. 2024
Lei Kang, et. al.Lei Kang ... Kai Zhang
01 Sep 2021
Journal of Physics: Conference Series | VOL. 2024

Distance metric learning by minimal distance maximization
Yaoliang Yu ... Liming Zhang
Pattern Recognition | VOL. 44
Yaoliang Yu, et. al.Yaoliang Yu ... Liming Zhang
07 Oct 2010
Pattern Recognition | VOL. 44

Linear Dimension Reduction for Multiple Heteroscedastic Multivariate Normal Populations
...
Open Journal of Statistics | VOL. 05
, et. al. ...
01 Jan 2015
Open Journal of Statistics | VOL. 05

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

SICA: subjectively interesting component analysis

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Data Mining and Knowledge Discovery