Abstract 4166A: An artificial intelligence based meta-analysis of publicly available single cell RNA-seq datasets for hematopoietic and lymphoid malignancies identifies repurposable cancer drug targets

Bei Jiang,Michael Januszyk

doi:10.1158/1538-7445.am2020-4166a

Abstract

Abstract Background: Hematopoietic malignancies represent a broad category of diseases that originate in blood or lymphoid tissues and will account for roughly 10 percent of the estimated 1.7 million new cancer diagnoses in 2019. Traditional treatment options include chemotherapy, radiation therapy, and bone marrow transplant. With the recent explosion in -omics level data platforms and increased patient opt-ins for electronic medical record usage, we have entered a new phase of drug development that focuses on utilizing data-driven approaches to identify targeted therapeutics. Here we apply artificial intelligence and machine learning principles to publicly available datasets to identify candidates for drug repurposing. Methods: A meta-analysis was performed using more than 500 publicly available single cell RNA-seq (scRNA-seq) datasets spanning multiple species, platforms, and capture techniques. Fifteen unique sets of study data examining human or mouse-modeled malignancies of hematopoietic and lymphoid origin were identified. Data were downloaded using the most upstream format available (ie fastq &gt; bam &gt; mtx &gt; h5/rdata). Mappings from fastqbam format were performed using STAR v2.5.3a, mapping from bam files to mtx were performed using Cell Ranger v3.0, and mappings from mtx files to h5 were performed using Scanpy v 1.4. Data were denoised using variational inference to account for study, species, disease, tissue, and subject-specific batch effects using scVI v0.4.1. Meta data from each study was extracted and binned by inspection into classification of ‘severe', ‘moderate', ‘mild', or ‘no disease'. Disease severity signatures were constructed accordingly using EdgeR v3.9, and paired with chemical perturbation signatures for more than 10,000 drug compounds, to identify and score therapeutic candidates. Results: A total of 588 datasets were evaluated using curations provided in Svensson et al 2019, yielding fifteen relevant human (n = 9) and mouse (n = 6) studies published in pubmed-indexed journals. These spanned 11 unique journals, the most common of which was Nature Immunology (n=3). Datasets were obtained from GEO (n = 10), ArrayExpress (n = 2), or laboratory/group-specific web servers (n = 3). Studies evaluated between 63 and 10,260 individual cells (mean 1763, median 362), across six different sequencing platforms. From the 12,442 distinct compounds profiled across 71 cell lines in the LINCS database, we identified 20 top candidates with perturbation profiles most correlated with the decreasing disease severity signatures generated through our scRNA-Seq meta-analysis. Conclusion: This represents a novel method for cancer drug discovery and repurposing using exclusively publicly available datasets. Citation Format: Bei Jiang, Michael Januszyk. An artificial intelligence based meta-analysis of publicly available single cell RNA-seq datasets for hematopoietic and lymphoid malignancies identifies repurposable cancer drug targets [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 4166A.

Full Text