Fast analysis of scATAC-seq data using a predefined set of genomic regions.

Valentina Giansanti,Davide Cittaro,Ming Tang

doi:10.12688/f1000research.22731.2

Abstract

Background: Analysis of scATAC-seq data has been recently scaled to thousands of cells. While processing of other types of single cell data was boosted by the implementation of alignment-free techniques, pipelines available to process scATAC-seq data still require large computational resources. We propose here an approach based on pseudoalignment, which reduces the execution times and hardware needs at little cost for precision. Methods:Public data for 10k PBMC were downloaded from 10x Genomics web site. Reads were aligned to various references derived from DNase I Hypersensitive Sites (DHS) using kallisto and quantified with bustools. We compared our results with the ones publicly available derived by cellranger-atac. We subsequently tested our approach on scATAC-seq data for K562 cell line. Results: We found that kallisto does not introduce biases in quantification of known peaks; cells groups identified are consistent with the ones identified from standard method. We also found that cell identification is robust when analysis is performed using DHS-derived reference in place of de novo identification of ATAC peaks. Lastly, we found that our approach is suitable for reliable quantification of gene activity based on scATAC-seq signal, thus allows for efficient labelling of cell groups based on marker genes. Conclusions: Analysis of scATAC-seq data by means of kallisto produces results in line with standard pipelines while being considerably faster; using a set of known DHS sites as reference does not affect the ability to characterize the cell populations.

Highlights

Analysis of scATAC-seq data has been recently scaled to thousands of cells
Single cell ATAC-seq data Single cell ATAC-seq data for PBMC were downloaded from the 10x Genomics public datasets and include sequences for 10k PBMC from a healthy donor
Limitations of kallisto-based analysis At time of writing, kallisto does not natively support scATACseq analysis, though it can be applied to any scRNA-seq technology which supports cellular barcodes (CB) and unique molecular identifiers (UMI)

Summary

Introduction

Analysis of scATAC-seq data has been recently scaled to thousands of cells. While processing of other types of single cell data was boosted by the implementation of alignment-free techniques, pipelines available to process scATAC-seq data still require large computational resources. Reads were aligned to various references derived from DNase I Hypersensitive Sites (DHS) using kallisto and quantified with bustools. We subsequently tested our approach on scATAC-seq data for K562 cell line. Results: We found that kallisto does not introduce biases in quantification of known peaks; cells groups identified are consistent with the ones identified from standard method. We found that cell identification is robust when analysis is performed using DHSderived reference in place of de novo identification of ATAC peaks. Conclusions: Analysis of scATAC-seq data by means of kallisto produces results in line with standard pipelines while being considerably faster; using a set of known DHS sites as reference does not affect the ability to characterize the cell populations

Objectives

Methods

Results

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: F1000Research	Publication Date: May 28, 2020
Citations: 13	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Fast analysis of scATAC-seq data using a predefined set of genomic regions.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research

Lead the way for us

Similar Papers

Fast analysis of scATAC-seq data using a predefined set of genomic regions
Valentina Giansanti ... Davide Cittaro
F1000Research | VOL. 9
Valentina Giansanti, et. al.Valentina Giansanti ... Davide Cittaro
20 May 2020
F1000Research | VOL. 9

Fast analysis of scATAC-seq data using a predefined set of genomic regions
Valentina Giansanti ... Ming Tang
F1000Research | VOL. 9
Valentina Giansanti, et. al.Valentina Giansanti ... Ming Tang
20 Mar 2020
F1000Research | VOL. 9

Single-cell ATAC sequencing analysis: From data preprocessing to hypothesis generation
Seungbyn Baek ... Insuk Lee
Computational and Structural Biotechnology Journal | VOL. 18
Seungbyn Baek, et. al.Seungbyn Baek ... Insuk Lee
01 Jan 2020
Computational and Structural Biotechnology Journal | VOL. 18

Decision letter: The single-cell chromatin accessibility landscape in mouse perinatal testis development
Deborah Bourc'his ... Marianne E Bronner
-
Deborah Bourc'his, et. al.Deborah Bourc'his ... Marianne E Bronner
31 Jan 2022
31 Jan 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast analysis of scATAC-seq data using a predefined set of genomic regions.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research