Abstract

The complexity of biological processes such as cell differentiation is reflected in dynamic transitions between cellular states. Trajectory inference arranges the states into a progression using methodologies propelled by single-cell biology. However, current methods, all returning a best trajectory, do not adequately assess statistical significance of noisy patterns, leading to uncertainty in inferred trajectories. We introduce a tree dimension test for trajectory presence in multivariate data by a dimension measure of Euclidean minimum spanning tree, a test statistic, and a null distribution. Computable in linear time to tree size, the tree dimension measure summarizes the extent of branching more effectively than globally insensitive number of leaves or tree diameter indifferent to secondary branches. The test statistic quantifies trajectory presence and its null distribution is estimated under the null hypothesis of no trajectory in data. On simulated and real single-cell datasets, the test outperformed the intuitive number of leaves and tree diameter statistics. Next, we developed a measure for the tissue specificity of the dynamics of a subset, based on the minimum subtree cover of the subset in a minimum spanning tree. We found that tissue specificity of pathway gene expression dynamics is conserved in human and mouse development: several signal transduction pathways including calcium and Wnt signaling are most tissue specific, while genetic information processing pathways such as ribosome and mismatch repair are least so. Neither the tree dimension test nor the subset specificity measure has any user parameter to tune. Our work opens a window to prioritize cellular dynamics and pathways in development and other multivariate dynamical systems.

Highlights

  • Recognizing dynamic transitions between cellular states can generate deeper understandings of development, disease processes, or environmental response inside a biological system

  • We introduce a statistical framework based on the tree dimension measure Td of Euclidean MST (EMST) to accomplish the task

  • The tree dimension test (TDT) method is not tied to principal component analysis (PCA) and other data preprocessing procedure could be used as reflected in the design of our accompanying software

Read more

Summary

Introduction

Recognizing dynamic transitions between cellular states can generate deeper understandings of development, disease processes, or environmental response inside a biological system. Trajectory inference methods, often operating on a low dimensional manifold embedded in a high dimensional space, employ various strategies to capture a trajectory Methods such as TSCAN make use of minimum spanning trees (MSTs) built on cluster centroids to capture a trajectory structure underlying the data [5]. Vandaele et al developed a method for inferring topological structures in graph data, applicable to trajectory inference [9] They highlighted some challenges [10]: most methods tend to underestimate the number of leaves in graph-representations of trajectories. They showed that topological information correlates to the performance of consecutive cell trajectory inference algorithms and many datasets with trajectory lack sufficient topological information for effective inference

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call