Abstract

Lineage tracing and trajectory inference from single-cell RNA-sequencing data hold tremendous potential for uncovering the genetic programs driving development and disease. Single cell datasets are thought to provide an unbiased view on the diverse cellular architecture of tissues. Sampling bias, however, can skew single cell datasets away from the cellular composition they are meant to represent. We demonstrate a novel form of sampling bias, caused by a statistical phenomenon related to repeated sampling from a growing, heterogeneous population. Relative growth rates of cells influence the probability that they will be sampled in clones observed across multiple time points. We support our probabilistic derivations with a simulation study and an analysis of a real time-course of T-cell development. We find that this bias can impact fate probability predictions, and we explore how to develop trajectory inference methods which are robust to this bias. Source code for the simulated datasets and to create the figures in this manuscript is freely available in python at https://github.com/rbonhamcarter/simulate-clones. A python implementation of the extension of the LineageOT method is freely available at https://github.com/rbonhamcarter/LineageOT/tree/multi-time-clones. Supplementary data are available at Bioinfomatics online.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call