Abstract

Abstract Introduction: The Lung Precancer Atlas (PCA) is developing bulk and single cell RNA-seq pipelines to process Lung PCA sequencing data collected across our multi-institutional consortium. The pipelines will be publicly available to run using the Terra Cloud platform so that they can be used by other members of the NCI Human Tumor Atlas Network (HTAN) and the broader research community. Method: We built our single cell and bulk RNA-seq pipelines for integration in the Terra cloud platform. Terra utilizes a Google Cloud/BigQuery backend to store data with built-in security features. The Terra platform provides a uniform resource for pipeline development and data processing for investigators located across multiple institutions. Result: The RNA-seq pipeline in Terra used by GTeX and HTOP utilizes STAR to align reads to a reference genome, RSEM and RNASeQC-2 to quantify expression and compute quality metrics. We have added additional quality metrics using the FASTQC and RSEQC tools as well as GATK germline variant calling. Somalier is used to perform fast fingerprinting for multiple samples derived from the same patient. The pipeline was also modified to estimate TCR/BCR repertoires using TRUST to facilitate downstream analyses of the immunological status of lung premalignant lesions. We have also enhanced Terra pipelines for droplet and plate-based single cell RNA sequencing data. The single cell RNA-seq preprocessing pipeline for 10X data included steps from CellRanger for demultiplexing, alignment to a reference genome, and count matrix generation. We have added the quality control pipeline from the singleCellTK package, which generates and aggregates quality control metrics from 8 different tools including those for doublet detection and ambient RNA quantification. For plate-based CEL-seq2 data, we have built a pipeline utilizing the SCRUFF package for alignment and singleCellTK for quality control. The count matrices and QC metrics are aggregated into SingleCellExperiment or SummarizedExperiment R objects for downstream analyses. Conclusion: Our completed Terra pipelines will allow researchers in the Lung PCA to process RNA sequencing data using a consistent set of tools and gene annotation. These pipelines, and the standardization of data processing and quality control that they provide, may be of use to other investigators in the Human Tumor Atlas Network as well as to broader scientific community. Citation Format: Chris Husted, François Aguet, Conor Shea, Adam Gower, William Mischler, Yusuke Koga, Rui Hong, Steven Dubinett, Avrum Spira, Sarah A. Mazzilli, Ethan Cerami, Ignaty Leshchiner, Marc E. Lenburg, Gad Getz, Jennifer E. Beane, Joshua D. Campbell. Cloud-based bulk and single-cell RNAseq pipelines in the Terra platform for the Lung PCA [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 171.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call