Recent advancements have finally delivered a complete human genome assembly, including the elusive Y chromosome. This accomplishment closes a significant knowledge gap. Prior efforts were hampered by challenges in sequencing repetitive DNA structures such as direct and inverted repeats. We used the G4Hunter algorithm to analyze the presence of G-quadruplex forming sequences (G4s) within the current human reference genome (GRCh38) and the new telomere-to-telomere (T2T) Y chromosome assemblies. This analysis served a dual purpose: identifying the location of potential G4s within the genomes and exploring their association with functionally annotated sequences. Compared to GRCh38, the T2T assembly exhibited a significantly higher prevalence of G-quadruplex forming sequences. Notably, these repeats were abundantly located around precursor RNA, exons, genes, and within protein binding sites. This remarkable co-occurrence of G4-forming sequences with these critical regulatory regions suggests their role in fundamental DNA regulation processes. Our findings indicate that the current human reference genome significantly underestimated the number of G4s, potentially overlooking their functional importance.
Read full abstract