Abstract
Factor analysis is a widely used method for dimensionality reduction in genome biology, with applications from personalized health to single-cell biology. Existing factor analysis models assume independence of the observed samples, an assumption that fails in spatio-temporal profiling studies. Here we present MEFISTO, a flexible and versatile toolbox for modeling high-dimensional data when spatial or temporal dependencies between the samples are known. MEFISTO maintains the established benefits of factor analysis for multimodal data, but enables the performance of spatio-temporally informed dimensionality reduction, interpolation, and separation of smooth from non-smooth patterns of variation. Moreover, MEFISTO can integrate multiple related datasets by simultaneously identifying and aligning the underlying patterns of variation in a data-driven manner. To illustrate MEFISTO, we apply the model to different datasets with spatial or temporal resolution, including an evolutionary atlas of organ development, a longitudinal microbiome study, a single-cell multi-omics atlas of mouse gastrulation and spatially resolved transcriptomics.
Highlights
Factor analysis is a first-line approach for the analysis of high-throughput sequencing data[1,2,3,4], and is increasingly applied in the context of multi-omics datasets[5,6,7,8]
Unlike existing factor analysis methods for multimodal data, MEFISTO incorporates the continuous covariate to account for spatio-temporal dependencies between samples, which allows for the identification of both spatio-temporally smooth factors as well as non-smooth factors that are independent of the continuous covariate (Fig. 1a,b)
MEFISTO combines factor analysis with the flexible non-parametric framework of Gaussian processes[13] to model spatio-temporal dependencies in the latent space, where each factor is governed by a continuous latent process with a variable degree of smoothness (Supplementary Information)
Summary
Factor analysis is a first-line approach for the analysis of high-throughput sequencing data[1,2,3,4], and is increasingly applied in the context of multi-omics datasets[5,6,7,8]. Prominent domains in which spatio-temporal profiling is used include developmental biology[10], longitudinal profiling in personalized medicine[11] or spatially resolved omics[12] Such designs and datasets pose new analytical challenges and opportunities, including the need to account for spatio-temporal dependencies across samples that are no longer invariant to permutations; deal with imperfect alignment between samples from different data modalities, and missing data; identify inter-individual heterogeneities of the underlying temporal and/or spatial functional modules; and distinguish spatio-temporal variation from non-smooth patterns of variations. Spatio-temporally informed dimensionality reduction could enable more accurate and interpretable recovery of the underlying patterns by leveraging known spatio-temporal dependencies rather than by solely relying on feature correlations To this end, we propose MEFISTO, a flexible and versatile method for addressing these challenges while maintaining the benefits of previous factor analysis models for multimodal data
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have