DevKidCC allows for robust classification and direct comparisons of kidney organoid datasets

Sean B Wilson,Joseph E Powell,Aude Dorison,Jessica M Vanslambrouck,Melissa H Little,Jose Alquicira-Hernandez,Sara E Howden

doi:10.1186/s13073-022-01023-z

Abstract

BackgroundWhile single-cell transcriptional profiling has greatly increased our capacity to interrogate biology, accurate cell classification within and between datasets is a key challenge. This is particularly so in pluripotent stem cell-derived organoids which represent a model of a developmental system. Here, clustering algorithms and selected marker genes can fail to accurately classify cellular identity while variation in analyses makes it difficult to meaningfully compare datasets. Kidney organoids provide a valuable resource to understand kidney development and disease. However, direct comparison of relative cellular composition between protocols has proved challenging. Hence, an unbiased approach for classifying cell identity is required.MethodsThe R package, scPred, was trained on multiple single cell RNA-seq datasets of human fetal kidney. A hierarchical model classified cellular subtypes into nephron, stroma and ureteric epithelial elements. This model, provided in the R package DevKidCC (github.com/KidneyRegeneration/DevKidCC), was then used to predict relative cell identity within published kidney organoid datasets generated using distinct cell lines and differentiation protocols, interrogating the impact of such variations. The package contains custom functions for the display of differential gene expression within cellular subtypes.ResultsDevKidCC was used to directly compare between distinct kidney organoid protocols, identifying differences in relative proportions of cell types at all hierarchical levels of the model and highlighting variations in stromal and unassigned cell types, nephron progenitor prevalence and relative maturation of individual epithelial segments. Of note, DevKidCC was able to distinguish distal nephron from ureteric epithelium, cell types with overlapping profiles that have previously confounded analyses. When applied to a variation in protocol via the addition of retinoic acid, DevKidCC identified a consequential depletion of nephron progenitors.ConclusionsThe application of DevKidCC to kidney organoids reproducibly classifies component cellular identity within distinct single-cell datasets. The application of the tool is summarised in an interactive Shiny application, as are examples of the utility of in-built functions for data presentation. This tool will enable the consistent and rapid comparison of kidney organoid protocols, driving improvements in patterning to kidney endpoints and validating new approaches.

Highlights

While single-cell transcriptional profiling has greatly increased our capacity to interrogate biology, accurate cell classification within and between datasets is a key challenge
One dataset was a recently published high quality human fetal kidney (HFK) dataset [22] (8,987 cells) that included both medulla and cortex regions and including a 96-day male and 108-day female sample. This dataset contained ureteric epithelium, which had not been thoroughly analysed to this point [47]. This data was combined with data from 17,759 HFK cells ranging from week 11 to 18 of gestation [24] to increase the developmental range of the training set
Cells from all datasets were integrated using Harmony [56] (Fig. 1A) before performing a supervised clustering and annotation, using the original annotations of each dataset as a guide. This led to a reference dataset containing three ureteric epithelial subpopulations including ureteric tip (UTip), outer stalk (UOS), inner stalk (UIS), four stromal subpopulations including stromal progenitor cells (SPC), cortical stroma (CS), medullary stroma (MS), mesangial cells (MesS), endothelium (Endo), the nephron progenitor cells (NPC) and the nephron including subpopulations of early nephron (EN), early distal tubule (EDT), distal tubule (DT), Loop of Henle (LOH), early proximal tubule (EPT), proximal tubule (PT), parietal epithelial cells (PEC), Fig. 1 Generation of a comprehensive reference to train classification models

Summary

Introduction

While single-cell transcriptional profiling has greatly increased our capacity to interrogate biology, accurate cell classification within and between datasets is a key challenge This is so in pluripotent stem cell-derived organoids which represent a model of a developmental system. When coupled with approaches for molecular lineage tagging [1] and computational approaches to analyse pseudotime [2,3,4] and RNA velocity [5, 6], gene expression in complex tissues such as the kidney can be studied at an unprecedented resolution Despite these advantages, classification of cellular identity remains challenging and variable between datasets, even when analysing similar cellular systems. Cell clusters are commonly defined based upon one or a few known differentially expressed genes rather than their global transcriptional signature Technical challenges such as batch variation can impact definitive cellular identification

Methods

Results

Discussion

Conclusion