Secondary structures are non-canonical arrangements of nucleic acids due to intra-strand interactions, including base pairing, stacking, or other higher-order features that deviate from the standard double-helical conformation. While these structures are extensively studied in RNA, they can also form when DNA becomes single stranded, creating topological roadblocks that can impact essential DNA-based processes such as replication, transcription, and repair, ultimately affecting genome stability. The availability of a complete linear sequence of human genomes, including repetitive loci, enables the prediction of DNA secondary structures comparing across various regions. Here, we evaluate the intrinsic properties of linear single-stranded DNA sequences derived from sampling specialized human loci such as centromeres, pericentromeres, ribosomal DNA (rDNA), and coding regions from the CHM13 genome. Our comparative analysis of predicted secondary structures across human chromosomes revealed the heightened presence, complexity, and instability of secondary structures within the centromere, which gradually decreased toward the pericentromere onto chromosomes' arms, on average lowest in coding regions. Notably, centromeric repeats exhibited the highest level of topological complexity within both the active and divergent domains, even when compared to other repetitive tandem satellites, such as rDNA in acrocentric chromosomes. Our findings provide evidence of the intrinsic self-hybridizing properties of centromere repeats, which are capable of generating complex topological structures that may functionally correlate with chromosome missegregation, especially when centromeric chromatin is disrupted. Processes such as long non-coding RNA transcription, recombination, and other mechanisms that dechromatinize and unwind stretches of linear DNA in these regions create invivo opportunities for the DNA acrobatics hereby predicted.
Read full abstract