Abstract

MotivationMany methods have been developed to cluster genes on the basis of their changes in mRNA expression over time, using bulk RNA-seq or microarray data. However, single-cell data may present a particular challenge for these algorithms, since the temporal ordering of cells is not directly observed. One way to address this is to first use pseudotime methods to order the cells, and then apply clustering techniques for time course data. However, pseudotime estimates are subject to high levels of uncertainty, and failing to account for this uncertainty is liable to lead to erroneous and/or over-confident gene clusters.ResultsThe proposed method, GPseudoClust, is a novel approach that jointly infers pseudotemporal ordering and gene clusters, and quantifies the uncertainty in both. GPseudoClust combines a recent method for pseudotime inference with non-parametric Bayesian clustering methods, efficient Markov Chain Monte Carlo sampling and novel subsampling strategies which aid computation. We consider a broad array of simulated and experimental datasets to demonstrate the effectiveness of GPseudoClust in a range of settings.Availability and implementationAn implementation is available on GitHub: https://github.com/magStra/nonparametricSummaryPSM and https://github.com/magStra/GPseudoClust.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

  • During response to stimulation or development, gene expression undergoes significant changes for many genes

  • We provide details of the simulated and real datasets to which we applied GPseudoClust, followed by a summary of our results, with further details in the Supplementary Material

  • We have found that GPseudoClust is more likely to get stuck in local posterior modes than GPseudoRank; e.g. Supplementary Figure S23 shows that, for GPseudoClust, different Markov Chain Monte Carlo sampling (MCMC) chains visit different posterior modes

Read more

Summary

Introduction

During response to stimulation or development, gene expression undergoes significant changes for many genes. For bulk measurements of gene expression these changes can be investigated by collecting time course data. A common analysis step for such datasets is to cluster genes on the basis of the similarities in their time course profiles. Eisen et al (1998) found that similar expression dynamics of genes are related to biological function, whereas Cooke et al (2011) showed that clustering genes together with similar changes in expression over time can identify those likely to be coregulated by the same transcription factors. Most existing methods for performing such clustering analyses were developed for bulk-measurements of gene expression, and not for single-cell data

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.