SCALE method for single-cell ATAC-seq analysis via latent feature extraction

Lei Xiong,Tao Jiang,Yanqiu Shao,Kang Tian,Ge Gao,Lei Tang,Qiangfeng Cliff Zhang,Kui Xu,Michael Zhang

doi:10.1038/s41467-019-12630-7

Abstract

Single-cell ATAC-seq (scATAC-seq) profiles the chromatin accessibility landscape at single cell level, thus revealing cell-to-cell variability in gene regulation. However, the high dimensionality and sparsity of scATAC-seq data often complicate the analysis. Here, we introduce a method for analyzing scATAC-seq data, called Single-Cell ATAC-seq analysis via Latent feature Extraction (SCALE). SCALE combines a deep generative framework and a probabilistic Gaussian Mixture Model to learn latent features that accurately characterize scATAC-seq data. We validate SCALE on datasets generated on different platforms with different protocols, and having different overall data qualities. SCALE substantially outperforms the other tools in all aspects of scATAC-seq data analysis, including visualization, clustering, and denoising and imputation. Importantly, SCALE also generates interpretable features that directly link to cell populations, and can potentially reveal batch effects in scATAC-seq experiments.

Highlights

Single-cell ATAC-seq profiles the chromatin accessibility landscape at single cell level, revealing cell-to-cell variability in gene regulation
SCALE models the input scATAC-seq data x as a joint distribution pðx; z; cÞ where c is one of predefined K clusters corresponding to a component of Gaussian Mixture Model (GMM), z is the latent variable obtained by z 1⁄4 μz þ σ z ε, where μz and σz are learned by the encoder network from x, and ε is sampled from Nð0; IÞ16
K predefined clusters, p(z|c) follows a mixture of Gaussians distribution with a mean μc and a variance σc for each component corresponding to a cluster c, and p(x|z) is a multivariable Bernoulli distribution modeled by the decoder network (Fig. 1)

Summary

Results

The imputation of SCALE could strengthen the distinct patterns of cluster-specific peaks by filling missing values and removing potential noise (Supplementary Fig. 10), which improves downstream analysis, for example the identification of cell-type-specific motifs and transcription factors by chromVAR We demonstrated this feature with the Forebrain dataset. We constructed the dataset by first generating reference scATAC-seq data consisting of three clusters, each containing 100 peaks with no missing values, randomly dropping out peaks and introducing noise (Methods, Supplementary Fig. 14a). In the embedding and clustering results based on the SCALE-extracted features, the cells of each replicate were distributed evenly in the low-dimensional space (Supplementary Fig. 17c) We confirmed this result by checking the top specific peaks for each replicate based on raw data and found no significantly different pattern across replicates We could improve the model to explicitly incorporate variables that are designated for the discovery and removal of batch effects and other possible technical variations

Methods

Code availability

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Nature Communications	Publication Date: Oct 8, 2019
Citations: 182	License type: open-access

R Discovery Prime

R Discovery Prime

SCALE method for single-cell ATAC-seq analysis via latent feature extraction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Nature Communications

Lead the way for us

Similar Papers

Incorporating network diffusion and peak location information for better single-cell ATAC-seq data analysis.
Jiating Yu ... Zhichao Hou
Briefings in bioinformatics | VOL. 25
Jiating Yu, et. al.Jiating Yu ... Zhichao Hou
22 Jan 2024
Briefings in bioinformatics | VOL. 25

Author response: Simultaneous trimodal single-cell measurement of transcripts, epitopes, and chromatin accessibility using TEA-seq
Elliott Swanson ... Cara Lord
-
Elliott Swanson, et. al.Elliott Swanson ... Cara Lord
13 Feb 2021
13 Feb 2021

Decision letter: The single-cell chromatin accessibility landscape in mouse perinatal testis development
Deborah Bourc'his ... Marianne E Bronner
-
Deborah Bourc'his, et. al.Deborah Bourc'his ... Marianne E Bronner
31 Jan 2022
31 Jan 2022

Matrix prior for data transfer between single cell data types in latent Dirichlet allocation.
Alan Min ... Timothy Durham
PLOS Computational Biology | VOL. 19
Alan Min, et. al.Alan Min ... Timothy Durham
05 May 2023
PLOS Computational Biology | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SCALE method for single-cell ATAC-seq analysis via latent feature extraction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Nature Communications