Abstract

Single-cell and bulk genomics assays have complementary strengths and weaknesses, and alone neither strategy can fully capture regulatory elements across the diversity of cells in complex tissues. We present CellWalker, a method that integrates single-cell open chromatin (scATAC-seq) data with gene expression (RNA-seq) and other data types using a network model that simultaneously improves cell labeling in noisy scATAC-seq and annotates cell type-specific regulatory elements in bulk data. We demonstrate CellWalker’s robustness to sparse annotations and noise using simulations and combined RNA-seq and ATAC-seq in individual cells. We then apply CellWalker to the developing brain. We identify cells transitioning between transcriptional states, resolve regulatory elements to cell types, and observe that autism and other neurological traits can be mapped to specific cell types through their regulatory elements.

Highlights

  • Gene regulatory elements are critical determinants of tissue and cell type-specific gene expression [1, 2]

  • The derived influence matrix made it possible to examine changes in regulation across neuronal development and map enhancers to specific cell types. Using this cell type-specific atlas of putative regulatory elements, we found that autism spectrum disorder (ASD) genes are enriched for pREs active in inhibitory interneurons, while developmental delay genes are enriched for pREs active in radial glia

  • We observed that pREs from cell types labeled to specific regions such as the medial ganglionic eminence (MGE) map to regions sampled from the ganglionic eminence (Additional File 1: Fig. S6a). These findings demonstrate that cell types resolved in scATAC-seq data with CellWalker can be used to annotate regulatory elements discovered in bulk ATAC-seq

Read more

Summary

Introduction

Gene regulatory elements are critical determinants of tissue and cell type-specific gene expression [1, 2]. Annotation of putative enhancers, promoters, and insulators has rapidly improved through large-scale projects such as ENCODE [3], PsychENCODE [4], B2B [5], and Roadmap Epigenomics [6] Both predictions and validations of regulatory elements have been made largely in cell lines or bulk tissues lacking anatomical and cellular specificity [7]. Bulk measurements miss regulatory elements specific to one cell type, especially minority ones [8]. Single-cell genomics is an exciting avenue to overcoming limitations of bulk tissue studies [8, 9] These technologies struggle with low-resolution measurements featuring high rates of dropout and few reads per cell [8, 9]. These strategies generally fail on scATAC-seq data because there are fewer reads per cell, and the portion of the genome being sequenced is typically much larger

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call