Abstract 2075: Highly customizable multi-sample single cell RNA-Seq pipeline on the CGC

Nevena Vukojicic,Dalibor Veljkovic,Zelia Worman,Aleksandar Danicic,Marko Matic,Brandi Davis-Dusenbery,Rowan Beck,Jack Digiovanna

doi:10.1158/1538-7445.am2023-2075

Abstract

Abstract Single-cell (sc) transcriptomics has revolutionized our understanding of the biological characteristics and dynamics of cancer development. It can help us identify rare cell subpopulations and understand mechanisms associated with tumor genesis, progression, and response to therapy. The most important step in the analyses of any scRNA-seq dataset is subpopulation identification, usually performed via unsupervised clustering, followed by gene marker identification. We created a highly customizable workflow for sc data analysis, implemented in Common Workflow Language (CWL) on the Cancer Genomics Cloud (CGC) platform. The NCI-funded CGC platform, powered by Seven Bridges, provides a collaborative cloud base computation infrastructure that collocates computation, over 750 bioinformatics workflows, and 3+ PB data to researchers, making the analysis of large datasets accessible from any environment. The “Multi-Sample Clustering and Gene Marker Identification with Seurat 4.1.0” workflow comprises the following steps: Loading scRNA-seq Expression Datasets, Quality Control and Preprocessing, and Clustering and Identification of Gene Markers. Our solution supports gene-cell count matrices generated by several commonly used quantifiers (for example, Cell Ranger counts, Salmon Alevin, Kallisto BUStools, STAR) from single or multiple sc datasets from different batches, as well as single or multiple single-cell samples combined in a single SingleCellExperiment object. The versatility of the pipeline is obtained using several implemented options in each of the steps. Quality control can be performed manually or automatically using several options for normalization (LogNormalize, Deconvolution, SCnorm and Linnorm) and for batch effect correction (Seurat and Harmony). For clustering, the pipeline uses Seurat's graph-based approach, with options for different clustering resolutions. After performing identification of gene markers for each cluster, a researcher can test differential expression using various packages including wilcox, bimod, roc, and DESeq2. Here, we demonstrate the application of this workflow to a typical sc analysis, by processing an open access dataset of 61k cells isolated from embryonal mouse pons and forebrain, two major brain tumor locations. We used different clustering resolutions to achieve different degrees of granularity and identified cluster-specific marker genes used to identify vulnerable cell populations. To enable researchers to use this analysis as a guideline, we made this analysis available as a public project. Further development of single-cell sequencing techniques will undoubtedly improve our understanding of tumor biology and highlight promising drug targets. CGC’s cloud base computation infrastructure, along with numerous available cancer datasets and easy-to-use single-cell data processing workflows, among others, will be instrumental in this process. Citation Format: Nevena Vukojicic, Aleksandar Danicic, Zelia Worman, Rowan Beck, Dalibor Veljkovic, Marko Matic, Jack DiGiovanna, Brandi Davis-Dusenbery. Highly customizable multi-sample single cell RNA-Seq pipeline on the CGC [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 2075.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Abstract 2075: Highly customizable multi-sample single cell RNA-Seq pipeline on the CGC

Abstract

Talk to us

Similar Papers

More From: Cancer Research

Lead the way for us

Similar Papers

Abstract 4414: Single-cell analysis on the Cancer Genomics Cloud reveals changes in the transcriptome profiles of endothelial tumor cells over time - novel insights from a public dataset of a mouse model of melanoma
Nemanja Vucic ... Manisha Ray
Cancer Research | VOL. 80
Nemanja Vucic, et. al.Nemanja Vucic ... Manisha Ray
13 Aug 2020
Cancer Research | VOL. 80

BCO App: tools for generating BioCompute Objects from next-generation sequencing workflows and computations
David Roberson ... Nan Xiao
F1000Research | VOL. 9
David Roberson, et. al.David Roberson ... Nan Xiao
23 Sep 2020
F1000Research | VOL. 9

BCO App: tools for generating BioCompute Objects from next-generation sequencing workflows and computations
Nan Xiao ... Dennis Dean
F1000Research | VOL. 9
Nan Xiao, et. al.Nan Xiao ... Dennis Dean
16 Sep 2020
F1000Research | VOL. 9

Abstract 6475: Leveraging real-world pancreatic cancer datasets to drive drug discovery and patient health
Kawther Abdilleh ... Brandi Davis-Dusenbery
Cancer Research | VOL. 84
Kawther Abdilleh, et. al.Kawther Abdilleh ... Brandi Davis-Dusenbery
22 Mar 2024
Abstract 6475: Leveraging real-world pancreatic cancer datasets to drive drug discovery and patient health
Kawther Abdilleh ... Brandi Davis-Dusenbery

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Abstract 2075: Highly customizable multi-sample single cell RNA-Seq pipeline on the CGC

Abstract

Talk to us

Similar Papers

More From: Cancer Research