The Parallel Factor Analysis 2 (PARAFAC2) is a multimodal factor analysis model suitable for analyzing multi-way data when one of the modes has incomparable observation units, for example, because of differences in signal sampling or batch sizes. A fully probabilistic treatment of the PARAFAC2 is desirable to improve robustness to noise and provide a principled approach for determining the number of factors, but challenging because direct model fitting requires that factor loadings be decomposed into a shared matrix specifying how the components are consistently co-expressed across samples and sample-specific orthogonality-constrained component profiles. We develop two probabilistic formulations of the PARAFAC2 model along with variational Bayesian procedures for inference: In the first approach, the mean values of the factor loadings are orthogonal leading to closed form variational updates, and in the second, the factor loadings themselves are orthogonal using a matrix Von Mises-Fisher distribution. We contrast our probabilistic formulations to the conventional direct fitting algorithm based on maximum likelihood on synthetic data and real fluorescence spectroscopy and gas chromatography-mass spectrometry data showing that the probabilistic formulations are more robust to noise and model order misspecification. The probabilistic PARAFAC2, thus, forms a promising framework for modeling multi-way data accounting for uncertainty.
Read full abstract