Abstract

In order to reduce the impact of technical variation inherent in single-cell RNA sequencing (scRNA-seq) technologies on biological interpretation of experiments, rigorous preprocessing and quality control is required to transform raw sequencing reads into high-quality, gene and transcript counts. While hundreds of tools have been developed for this purpose, the vast majority of the most widely used tools are built for the R software environment. With an increasing number of new tools now being developed using Python, it is necessary to develop integrative workflows that leverage tools from both platforms. We have therefore developed, SASCRiP (Sequencing Analysis of Single-Cell RNA in Python), a modular single-cell preprocessing workflow that integrates functionality from existing, widely used R and Python packages, and additional custom features and visualizations, to enable preprocessing of scRNA-seq data derived from technologies that use unique molecular identifier (UMI) sequences in a single Python analysis workflow. We describe the utility of SASCRiP using datasets derived from peripheral blood mononuclear cells sequenced using droplet-based, 3′-end sequencing technology. We highlight SASCRiP’s diagnostic visualizations and fully customizable functions, and demonstrate how SASCRiP provides a highly flexible, integrative Python workflow for preparing unprocessed UMI count-based scRNA-seq data for subsequent downstream analyses. SASCRiP is freely available through PyPi or from the GitHub page.

Highlights

  • Since a method for single-cell RNA sequencing was first proposed (Tang et al, 2009), a number of diverse technologies have emerged for studying gene expression at single-cell resolution

  • Droplet-based, 3′-end sequencing technologies such as Drop-seq (Macosko et al, 2015), inDrop (Klein et al, 2015), and in particular, 10X Genomics Chromium (Zheng et al, 2017) have become increasingly popular for studying gene expression across different cell types and cellular states due to their lower sequencing cost per cell, and higher throughput relative to other available technologies. These technologies rely on the use of short unique molecular identifier (UMI) sequences that are used to both quantify gene expression and reduce technical variation inherently present in scRNA-seq datasets (Islam et al, 2014; Kivioja et al, 2011)

  • Hundreds of software tools have been developed to perform these preprocessing and quality control steps, with the vast majority of these software tools having historically been built for the R software environment (Zappia et al, 2018)

Read more

Summary

15 Feb 2022 view

1. Fabiola Curion , Technische Universität München, Munich, Germany Luke Zappia, Technische Universität München, Munich, Germany. Any reports and responses or comments on the article can be found at the end of the article. This article is included in the Bioinformatics gateway. This article is included in the Python collection

Introduction
Methods
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.