The rapid growth of single-cell transcriptomic technology has produced an increasing number of datasets for both embryonic development and in vitro pluripotent stem cell-derived models. This avalanche of data surrounding pluripotency and the process of lineage specification has meant it has become increasingly difficult to define specific cell types or states in vivo, and compare these with in vitro differentiation. Here we utilize a set of deep learning tools to integrate and classify multiple datasets. This allows the definition of both mouse and human embryo cell types, lineages and states, thereby maximizing the information one can garner from these precious experimental resources. Our approaches are built on recent initiatives for large-scale human organ atlases, but here we focus on material that is difficult to obtain and process, spanning early mouse and human development. Using publicly available data for these stages, we test different deep learning approaches and develop a model to classify cell types in an unbiased fashion at the same time as defining the set of genes used by the model to identify lineages, cell types and states. We used our models trained on in vivo development to classify pluripotent stem cell models for both mouse and human development, showcasing the importance of this resource as a dynamic reference for early embryogenesis.
Read full abstract