SASCRiP: A Python workflow for preprocessing UMI count-based scRNA-seq data

Darisia Moonsamy,Nikki Gentle

doi:10.12688/f1000research.75243.1

Abstract

In order to reduce the impact of technical variation inherent in single-cell RNA sequencing (scRNA-seq) technologies on biological interpretation of experiments, rigorous preprocessing and quality control is required to transform raw sequencing reads into high-quality, gene and transcript counts. While hundreds of tools have been developed for this purpose, the vast majority of the most widely used tools are built for the R software environment. With an increasing number of new tools now being developed using Python, it is necessary to develop integrative workflows that leverage tools from both platforms. We have therefore developed, SASCRiP (Sequencing Analysis of Single-Cell RNA in Python), a modular single-cell preprocessing workflow that integrates functionality from existing, widely used R and Python packages, and additional custom features and visualizations, to enable preprocessing of scRNA-seq data derived from technologies that use unique molecular identifier (UMI) sequences in a single Python analysis workflow. We describe the utility of SASCRiP using datasets derived from peripheral blood mononuclear cells sequenced using droplet-based, 3′-end sequencing technology. We highlight SASCRiP’s diagnostic visualizations and fully customizable functions, and demonstrate how SASCRiP provides a highly flexible, integrative Python workflow for preparing unprocessed UMI count-based scRNA-seq data for subsequent downstream analyses. SASCRiP is freely available through PyPi or from the GitHub page.

Highlights

Since a method for single-cell RNA sequencing was first proposed (Tang et al, 2009), a number of diverse technologies have emerged for studying gene expression at single-cell resolution
Droplet-based, 3′-end sequencing technologies such as Drop-seq (Macosko et al, 2015), inDrop (Klein et al, 2015), and in particular, 10X Genomics Chromium (Zheng et al, 2017) have become increasingly popular for studying gene expression across different cell types and cellular states due to their lower sequencing cost per cell, and higher throughput relative to other available technologies. These technologies rely on the use of short unique molecular identifier (UMI) sequences that are used to both quantify gene expression and reduce technical variation inherently present in scRNA-seq datasets (Islam et al, 2014; Kivioja et al, 2011)
Hundreds of software tools have been developed to perform these preprocessing and quality control steps, with the vast majority of these software tools having historically been built for the R software environment (Zappia et al, 2018)

Summary

15 Feb 2022 view

1. Fabiola Curion , Technische Universität München, Munich, Germany Luke Zappia, Technische Universität München, Munich, Germany. Any reports and responses or comments on the article can be found at the end of the article. This article is included in the Bioinformatics gateway. This article is included in the Python collection

Introduction

Methods

Findings

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: F1000Research	Publication Date: Feb 15, 2022
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

SASCRiP: A Python workflow for preprocessing UMI count-based scRNA-seq data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research

Lead the way for us

Similar Papers

Single-Cell RNA-Seq Technologies and Related Computational Data Analysis.
Geng Chen ... Tieliu Shi
Frontiers in Genetics | VOL. 10
Geng Chen, et. al.Geng Chen ... Tieliu Shi
05 Apr 2019
Frontiers in Genetics | VOL. 10

Analysis of Single-Cell RNA-Sequencing Data: A Step-by-Step Guide
Aanchal Malhotra ... Shesh N Rai
BioMedInformatics | VOL. 2
Aanchal Malhotra, et. al.Aanchal Malhotra ... Shesh N Rai
26 Dec 2021
BioMedInformatics | VOL. 2

Potential applications of deep learning in single-cell RNA sequencing analysis for cell therapy and regenerative medicine.
Ruojin Yan ... Chunmei Fan
Stem Cells | VOL. 39
Ruojin Yan, et. al.Ruojin Yan ... Chunmei Fan
15 Feb 2021
Stem Cells | VOL. 39

Decision letter: Identification of phenotypically, functionally, and anatomically distinct stromal niche populations in human bone marrow based on single-cell RNA sequencing
Dirk Strunk ... Mone Zaidi
-
Dirk Strunk, et. al.Dirk Strunk ... Mone Zaidi
06 Sep 2022
06 Sep 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SASCRiP: A Python workflow for preprocessing UMI count-based scRNA-seq data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research