Abstract

Principal component analysis (PCA) is one of the most popular tools in multivariate exploratory data analysis. Its probabilistic version (PPCA) based on the maximum likelihood procedure provides a probabilistic manner to implement dimension reduction. Recently, the bilinear PPCA (BPPCA) model, which assumes that the noise terms follow matrix variate Gaussian distributions, has been introduced to directly deal with two-dimensional (2-D) data for preserving the matrix structure of 2-D data, such as images, and avoiding the curse of dimensionality. However, Gaussian distributions are not always available in real-life applications which may contain outliers within data sets. In order to make BPPCA robust for outliers, in this paper, we propose a robust BPPCA model under the assumption of matrix variate t distributions for the noise terms. The alternating expectation conditional maximization (AECM) algorithm is used to estimate the model parameters. Numerical examples on several synthetic and publicly available data sets are presented to demonstrate the superiority of our proposed model in feature extraction, classification and outlier detection.

Highlights

  • High-dimensional data are increasingly collected for a variety of applications in the real world

  • Principal component analysis (PCA) [4] is arguably the most well-known dimension reduction method for high-dimensional data analysis, and it aims to find the first few principal eigenvectors corresponding to the first few largest eigenvalues of the covariance matrix, and projects the high-dimensional data onto the low-dimensional subspace spanned by these principal eigenvectors to achieve the purpose of dimensionality reduction

  • The difference from bilinear probabilistic PCA model (PPCA) (BPPCA) is that the noise matrices Ec, Er and E and latent matrix variate Z in the Robust bilinear probabilistic PCA algorithm (RBPPCA) model (11) are supposed matrix variate t distributions by (10), i.e., Ec ∼ Tdc,qr (ν, 0dc ×qr, σc2 Idc, Iqr ), Er ∼ Tqc,dr (ν, 0qc ×dr, Iqc, σr2 Idr ), E ∼ Tdc,dr (ν, 0dc ×dr, σc2 Idc, σr2 Idr ), Z ∼ Tqc,qr (ν, 0qc ×qr, Iqc, Iqr )

Read more

Summary

Introduction

High-dimensional data are increasingly collected for a variety of applications in the real world. Following PPCA, a probabilistic second-order PCA, called PSOPCA, is developed in [10] to directly model 2-D image matrices based on the so-called matrix variate Gaussian distributions. Some robust probabilistic models under the assumption of the t distribution have already been done successfully by a number of researchers in [17,18,19,20,21,22] Motivated by these facts, we will continue the effort to develop a robust BPPCA model from matrix variate t distributions to handle 2-D data sets in the presence of outliers.

Preliminaries
The Model
Estimation of the Parameters
Numerical Examples
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.