Abstract

Principal component analysis (PCA) is a widely used model for dimensionality reduction. In this paper, we address the problem of determining the intrinsic dimensionality of a general type data population by selecting the number of principal components for a generalized PCA model. In particular, we propose a generalized Bayesian PCA model, which deals with general type data by employing exponential family distributions. Model selection is realized by empirical Bayesian inference of the model. We name the model as simple exponential family PCA (SePCA), since it embraces both the principal of using a simple model for data representation and the practice of using a simplified computational procedure for the inference. Our analysis shows that the empirical Bayesian inference in SePCA formally realizes an intuitive criterion for PCA model selection - a preserved principal component must sufficiently correlate to data variance that is uncorrelated to the other principal components. Experiments on synthetic and real data sets demonstrate effectiveness of SePCA and exemplify its characteristics for model selection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call