Abstract

Abstract An important consideration when studying cancer in diverse populations is the representativeness of the reference genome. Cancer genomics findings are largely based upon alignments to GRCh37 and GRCh38 references, and with the recent release of the telomere-to-telomere (T2T) and the Human Pangenome Reference Consortium references, analyses of sequences distinct from previous references in the cancer context provide exciting opportunities to identify novel targets that may be linked to cancer outcomes. Because GRCh38 and T2T are overwhelmingly Eurocentric, the question remains if these references have applicability outside the ancestral populations that comprise them. Prior to T2T and pangenome, whole genome sequencing of 910 individuals of African descent revealed 296.5 million distinct base pairs, which were assembled into 125,715 African Pan Genome (APG) contigs. These unique sequences represent African ancestry novel genomic sequences whose function was unclear. Here we present provisional efforts to determine the function of APG contigs, and whether they are represented in newer references. To determine the epigenetic potential of the 125,715 APG contigs, we used Emboss Cpgplot to predict CpG islands. We identified 8,353 potential CpG islands, where 6,352 contigs had at least 1 predicted CpG island, and 1 contig had 9. There was weak correlation between contig length, GC content and number of predicted CpG islands suggesting that these sequences represented potentially functional regions. Using unmapped transcriptomic reads from our published African ancestry triple negative breast cancer cohort, we sought to determine the expression potential of these contigs. Unmapped reads mapped to ~9% of the contigs, where ~1100 APG contigs had both predicted CpG islands and alignment of the unmapped transcriptomic reads. This strongly suggests that these APG contigs contain African ancestry-specific regulatory regions and functional gene coding sequences. Preliminary BLASTn queries of contigs with regulatory potential revealed that these sequences may be present in T2T. Mapping to T2T and visualization of expression and CpG islands showed that our predictions matched novel T2T sequences. We anticipated some contigs would map to T2T given one- or two-end placement in GRCh38 and wanted to determine if any of the ~1100 contigs mapped to T2T. Initial analysis of this contig subset shows ~15% fail to map to T2T, indicating that T2T may still fail to capture diversity across ancestry groups. These sequences represent previously unstudied expression and epigenetic potential not captured in the reference genome, and we are working to evaluate all 125,715 sequences. Capturing the diversity across the genome remains extremely important to contextualize these findings and build resources to continue moving precision medicine approaches forward across diverse ancestry groups. Citation Format: Rachel Martini, Kyriaki Founta, Sebastian Maurice, Jason White, Onyinye Balogun, Nyasha Chambwe, Melissa Davis. Unveiling hidden genomic diversity: Exploring epigenetic potential and gene expression in African Pan Genome contig sequences [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 2948.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call