Abstract An often-overlooked component in our cancer genomics workflows is the reference genome. While most genomics analytical tools and findings are based on reference builds GRCh37 and GRCh38, these references are largely Eurocentric, and their representativeness among diverse populations has not been studied comprehensively. A previously published study conducted deep whole genome sequencing of over 900 individuals of African ancestry, where they revealed ∼296 million unique base pairs, assembled into 125,715 African Pan Genome (APG) contig sequences that are not represented in the reference genome. To further characterize these APG contig sequences, we sought to determine the epigenetic and expression potential of these sequences. We have previously reported preliminary efforts to characterize the unexplored epigenetic potential across the 125,715 APG contig sequences, where we have utilized Emboss CpG plot for CpG island prediction. Across all contig sequences, we identified 8,353 potential CpG islands across 6,352 contig sequences, where contigs had 1 to 9 predicted CpG islands. Taken together ∼5% of all APG contigs contained at least 1 predicted CpG island. To determine the expression potential of these contigs, we mapped RNAseq reads from an African ancestry-enriched triple negative breast cancer cohort to APG contigs that previously failed to map to the reference genome. Approximately 9% of APG contigs had transcriptomic reads mapping, and 1106 APG contigs had both reads mapping and predicted CpG islands. Given our RNAseq mapping from our African ancestry enriched cohort, we have begun to explore differential expression measures among contigs by African ancestry population groups, highlighting the potential for differential regulation at these understudied loci. Since the publication of the APG contig sequences, the recent telomere-to-telomere (T2T) reference has filled the gaps of GRCh37 and GRCh38, however, is even less diverse. With T2T containing novel sequences that were not present at the time the APG contigs were assembled, we performed BLASTn queries of APG contigs to the T2T genome, where preliminary analysis shows ∼17% fail to map to T2T. For those with BLASTn mapping results, we are currently assessing the coverage and identity of the mapping results to evaluate the alignment. In conclusion, these APG contig sequences harbor potential epigenetic regulatory function and expressed sequences not represented in our standard reference genomes. While some APG contig sequences are represented in the newer T2T genome, preliminary investigations show that these APG contig sequences map to the T2T genome in regions novel from GRCh37 and GRCh38. As we continue to explore these APG contig sequences, we may find ancestry-specific regulatory mechanisms not yet described. Citation Format: Rachel Martini, Kyriaki Founta, Sebastian Maurice, Jason White, Lisa Newman, Onyinye Balogun, Nyasha Chambwe, Melissa Davis. Epigenetic insights and gene expression in African pan genome contig sequences [abstract]. In: Proceedings of the 17th AACR Conference on the Science of Cancer Health Disparities in Racial/Ethnic Minorities and the Medically Underserved; 2024 Sep 21-24; Los Angeles, CA. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2024;33(9 Suppl):Abstract nr C093.
Read full abstract