Abstract

Modeling bifurcations in single-cell transcriptomics data has become an increasingly popular field of research. Several methods have been proposed to infer bifurcation structure from such data, but all rely on heuristic non-probabilistic inference. Here we propose the first generative, fully probabilistic model for such inference based on a Bayesian hierarchical mixture of factor analyzers. Our model exhibits competitive performance on large datasets despite implementing full Markov-Chain Monte Carlo sampling, and its unique hierarchical prior structure enables automatic determination of genes driving the bifurcation process. We additionally propose an Empirical-Bayes like extension that deals with the high levels of zero-inflation in single-cell RNA-seq data and quantify when such models are useful. We apply or model to both real and simulated single-cell gene expression data and compare the results to existing pseudotime methods. Finally, we discuss both the merits and weaknesses of such a unified, probabilistic approach in the context practical bioinformatics analyses.

Highlights

  • Trajectory analysis of single-cell RNA-seq data has become a popular method that attempts to infer lost temporal information, such as a cell’s differentiation state1,2

  • We propose the first generative, fully probabilistic model for such inference based on a Bayesian hierarchical mixture of factor analyzers

  • principal component analysis (PCA) representations of the synthetic data can be seen in Figures 2A and B, showing the characteristic Y

Read more

Summary

Introduction

Trajectory analysis of single-cell RNA-seq (scRNA-seq) data has become a popular method that attempts to infer lost temporal information, such as a cell’s differentiation state. Trajectory analysis of single-cell RNA-seq (scRNA-seq) data has become a popular method that attempts to infer lost temporal information, such as a cell’s differentiation state1,2 Such analyses reconstruct a measure of a cell’s progression through some biological process, known as a pseudotime. Several methods have been proposed to infer bifurcation structure from single-cell data. While DPT arguably has a probabilistic interpretation, neither method specifies a fully generative model that incorporates measurement noise, while both infer bifurcations retrospectively after constructing pseudotimes. A further algorithm Monocle learns pseudotimes based on dimensionality reduction using the DDRTree algorithm and provides post-hoc inference of genes involved in the bifurcation process using generalized linear models

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.