Abstract

Abstract Introduction Single cell RNA sequencing (scRNA-seq) is a powerful technique to profile the transcriptome of each single cell in the sample to characterize its gene expression profile. Final 10x scRNA-seq libraries are comprised of standard Illumina paired end constructs. Read 2 encodes the sequence of the actual cDNA fragment used in library preparation to amplify the RNA sequences, so read 2 is a result of transcription from the DNA encoded in a particular gene. We hypothesize that single nucleotide polymorphisms (SNPs) and global ancestry can be inferred from read 2. Methods Using the read 2, FASTQ files were mapped to the reference genome using the STAR algorithm (2-pass), subsequent steps were performed according to GATK best practices. For reference populations, we selected 2142 unrelated individuals from the 1000 genomes project with admixture >80% in a single ancestral population using k=5 ancestral populations. SNPs quality control for the FT samples and reference included call rates of < 95% or minor allele frequency (MAF) < 1% which were filtered out. Genotypes of fallopian tube (FT) cases were merged with pruned reference genotypes (markers) and intersecting SNPs were used for further analysis. Data was visualized using PCA and ADMIXTURE (k=5). Clinical and demographic variables available in our dataset included age at time of surgery, race, ethnicity, country of birth and specific germline mutation. Whole genome sequencing on the matched germline DNA of the same samples was performed as a validation step. Results A total of 34 cases underwent scRNA-seq. FT with germline mutations included seven BRCA1, seven BRCA2, five PALB2, two RAD51, two MLH1 and one ATM. Additionally, we sequenced six FT with no germline mutations and four ovarian cancer tumors. Self-identified race included 2 Asian, 9 Black, 16 White Hispanic and 6 White non-hispanic. A total of 730 382 SNPs intersected between all samples and pruned references. PCA visualization showed correlation between self-identified race and inferred global ancestry using five ancestral populations. Furthermore, there was high correlation between country of birth, position of cases in the PCA and ADMIXTURE with K=5. Conclusion Inferring global ancestry from scRNA seq reads will help estimate the genetic ancestry from data in which this variable is missing or was not captured. Genetic ancestry is a powerful tool that should be considered when generalizing gene expression results to populations. Citation Format: Alex P. Sanchez-Covarrubias, Ashlee Sealy, Dylan Thompson, David Samuel, Ayodele Omotoso, Destiny Burnett, Matthew Schlumbrecht, Sophia George. Inferring global ancestry using scRNA sequencing reads of the fallopian tube and ovarian cancer [abstract]. In: Proceedings of the 16th AACR Conference on the Science of Cancer Health Disparities in Racial/Ethnic Minorities and the Medically Underserved; 2023 Sep 29-Oct 2;Orlando, FL. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2023;32(12 Suppl):Abstract nr C005.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call