Abstract

Factor analysis is a widely used method for dimensionality reduction in genome biology, with applications from personalized health to single-cell biology. Existing factor analysis models assume independence of the observed samples, an assumption that fails in spatio-temporal profiling studies. Here we present MEFISTO, a flexible and versatile toolbox for modeling high-dimensional data when spatial or temporal dependencies between the samples are known. MEFISTO maintains the established benefits of factor analysis for multimodal data, but enables the performance of spatio-temporally informed dimensionality reduction, interpolation, and separation of smooth from non-smooth patterns of variation. Moreover, MEFISTO can integrate multiple related datasets by simultaneously identifying and aligning the underlying patterns of variation in a data-driven manner. To illustrate MEFISTO, we apply the model to different datasets with spatial or temporal resolution, including an evolutionary atlas of organ development, a longitudinal microbiome study, a single-cell multi-omics atlas of mouse gastrulation and spatially resolved transcriptomics.

Highlights

  • Factor analysis is a first-line approach for the analysis of high-throughput sequencing data[1,2,3,4], and is increasingly applied in the context of multi-omics datasets[5,6,7,8]

  • Unlike existing factor analysis methods for multimodal data, MEFISTO incorporates the continuous covariate to account for spatio-temporal dependencies between samples, which allows for the identification of both spatio-temporally smooth factors as well as non-smooth factors that are independent of the continuous covariate (Fig. 1a,b)

  • MEFISTO combines factor analysis with the flexible non-parametric framework of Gaussian processes[13] to model spatio-temporal dependencies in the latent space, where each factor is governed by a continuous latent process with a variable degree of smoothness (Supplementary Information)

Read more

Summary

Introduction

Factor analysis is a first-line approach for the analysis of high-throughput sequencing data[1,2,3,4], and is increasingly applied in the context of multi-omics datasets[5,6,7,8]. Prominent domains in which spatio-temporal profiling is used include developmental biology[10], longitudinal profiling in personalized medicine[11] or spatially resolved omics[12] Such designs and datasets pose new analytical challenges and opportunities, including the need to account for spatio-temporal dependencies across samples that are no longer invariant to permutations; deal with imperfect alignment between samples from different data modalities, and missing data; identify inter-individual heterogeneities of the underlying temporal and/or spatial functional modules; and distinguish spatio-temporal variation from non-smooth patterns of variations. Spatio-temporally informed dimensionality reduction could enable more accurate and interpretable recovery of the underlying patterns by leveraging known spatio-temporal dependencies rather than by solely relying on feature correlations To this end, we propose MEFISTO, a flexible and versatile method for addressing these challenges while maintaining the benefits of previous factor analysis models for multimodal data

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call