Fast analysis of scATAC-seq data using a predefined set of genomic regions

Valentina Giansanti,Davide Cittaro,Ming Tang

doi:10.12688/f1000research.22731.1

Abstract

Background: Analysis of scATAC-seq data has been recently scaled to thousands of cells. While processing of other types of single cell data was boosted by the implementation of alignment-free techniques, pipelines available to process scATAC-seq data still require large computational resources. We propose here an approach based on pseudoalignment, which reduces the execution times and hardware needs at little cost for precision. Methods:Public data for 10k PBMC were downloaded from 10x Genomics web site. Reads were aligned to various references derived from DNase I Hypersensitive Sites (DHS) using kallisto and quantified with bustools. We compared our results with the ones publicly available derived by cellranger-atac. We subsequently tested our approach on scATAC-seq data for K562 cell line. Results: We found that kallisto does not introduce biases in quantification of known peaks; cells groups identified are consistent with the ones identified from standard method. We also found that cell identification is robust when analysis is performed using DHS-derived reference in place of de novo identification of ATAC peaks. Lastly, we found that our approach is suitable for reliable quantification of gene activity based on scATAC-seq signal, thus allows for efficient labelling of cell groups based on marker genes. Conclusions: Analysis of scATAC-seq data by means of kallisto produces results in line with standard pipelines while being considerably faster; using a set of known DHS sites as reference does not affect the ability to characterize the cell populations.

Highlights

Recent technological advances in single-cell technologies resulted in a tremendous increase in the throughput in a relatively short span of time[1]
Limitations of kallisto-based analysis At time of writing, kallisto does not natively support scATACseq analysis, though it can be applied to any scRNA-seq technology which supports cellular barcodes (CB) and unique molecular identifiers (UMI)
According to the kallisto manual, the technology needs to be specified with a tuple of indices indicating the read number, the start position and the end position of the CB, the UMI and the sequence respectively

Summary

Introduction

Recent technological advances in single-cell technologies resulted in a tremendous increase in the throughput in a relatively short span of time[1]. Analysis of NGS data benefit from technologies based on k-mer processing, allowing alignment-free sequence comparison[4]. Most of these technologies require a catalog of k-mers expected to be in the dataset and, subject of quantification. While processing of other types of single cell data was boosted by the implementation of alignment-free techniques, pipelines available to process scATAC-seq data still require large computational resources. Conclusions: Analysis of scATAC-seq data by means of kallisto produces results in line with standard pipelines while being considerably faster; using a set of known DHS sites as reference does not affect the ability to characterize the cell populations version 2 (revision)

Objectives

Methods

Results

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: F1000Research	Publication Date: Mar 20, 2020
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Fast analysis of scATAC-seq data using a predefined set of genomic regions

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research

Lead the way for us

Similar Papers

Fast analysis of scATAC-seq data using a predefined set of genomic regions
Valentina Giansanti ... Davide Cittaro
F1000Research | VOL. 9
Valentina Giansanti, et. al.Valentina Giansanti ... Davide Cittaro
20 May 2020
F1000Research | VOL. 9

Fast analysis of scATAC-seq data using a predefined set of genomic regions.
Valentina Giansanti ... Ming Tang
F1000Research | VOL. 9
Valentina Giansanti, et. al.Valentina Giansanti ... Ming Tang
28 May 2020
F1000Research | VOL. 9

Single-cell ATAC sequencing analysis: From data preprocessing to hypothesis generation
Seungbyn Baek ... Insuk Lee
Computational and Structural Biotechnology Journal | VOL. 18
Seungbyn Baek, et. al.Seungbyn Baek ... Insuk Lee
01 Jan 2020
Computational and Structural Biotechnology Journal | VOL. 18

Decision letter: The single-cell chromatin accessibility landscape in mouse perinatal testis development
Deborah Bourc'his ... Marianne E Bronner
-
Deborah Bourc'his, et. al.Deborah Bourc'his ... Marianne E Bronner
31 Jan 2022
31 Jan 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast analysis of scATAC-seq data using a predefined set of genomic regions

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research