Abstract Background: As bulk and single-cell RNA (scRNA) sequencing studies and data continue to accrue to publicly accessible databases, bioinformatic tools continue being developed to analyze these datasets. The resulting pipelines often focus on either bulk or scRNA sequencing but rarely both. scRNA sequencing allows for in-depth characterization of transcription level events in distinct cell populations while bulk data offers a more global view, and for now, is more widely available in databases. Alternative splicing events in particular have been underexplored in a single-cell genomic architecture. The purpose of this study was to develop a bioinformatic pipeline to employ both bulk and single-cell mammary datasets to identify and validate alternative splicing events.Methods: Bulk fastq’s were utilized to identify AS events in luminal progenitor (LP) and mouse mammary stem cell (MSC) lineages in ovariectomized FVB mice that had been randomized into three treatment groups (SHAM [C], estradiol + progesterone [EP], EP + the selective progesterone receptor inhibitor telapristone acetate, TPA [EPT]). scRNA sequencing data from (Bach et al., 2017) was downloaded from the Gene Expression Omnibus (GEO) and utilized for validation of the AS events identified in the bulk data. To separate the data by progesterone level, the scRNA data was parsed into nulliparous, gestational, lactational, and post-involution groups. Further, to determine cell lineage level effects, the data was parsed into luminal and basal compartments using krt18 and krt5 expression. To accomplish the overall comparison between bulk and scRNA sequencing data, we developed a pipeline that would process and parse the scRNA data, identify the alternative splicing in bulk sequencing using rMATS and in the scRNA data using Outrigger (part of the Expedition suite). The resulting data from rMATS [bulk] LP/MSC: CvEP, CvEPT, & EPvEPT were filtered for events meeting statistical significance (p-value < 0.05, FDR < 0.01). After filtering, the events from each treatment comparison were then filtered so only events unique to LP or MSC CvEP, respectively, remain. The remaining significant and unique AS events are then compared to the Krt18-high and Krt5-high Outrigger results from each development stage based on genomic event coordinates (with a buffer +/- 20 base pairs).Results: 12 alternative splicing (AS) events were identified to be shared between the bulk and scRNA sequencing data in the context of increased progesterone exposure (EP and gestational, respectively) as well as cell lineage (LP and luminal respectively). Among the genes identified as having been alternatively spiced with an exact nucleotide match for the splice were Eif4a2, Pik3c2a, Brd4, Cdh1, and Enah. Conclusions: Alternative splicing events can be identified and validated between genomic sequencing methods, specifically bulk and single-cell RNA seq. We hypothesize that as scRNA-seq becomes more developed and quantification errors caused by short-read lengths fixed by full length scRNA seq methods currently in development, comparing data between single-cell and bulk may be made easier. However, we have shown that, when orthogonal methods for validation may not be feasible, events as specific as alternative splices can be validated using publicly available data and in-silico methods. Citation Format: Gannon Cottone, Benjamin T Spike, Elnaz Mirzaei Mehrabad, Seema A Khan, Susan Clare. Validating alternative splicing events between bulk and scRNA sequencing data: A bioinformatic approach [abstract]. In: Proceedings of the 2021 San Antonio Breast Cancer Symposium; 2021 Dec 7-10; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2022;82(4 Suppl):Abstract nr P3-09-16.
Read full abstract