Biallelic expansions of various tandem repeat sequence motifs are possible in RFC1, encoding the DNA replication/repair protein RFC1, yet only certain repeat motifs cause cerebellar ataxia, neuropathy, and vestibular areflexia syndrome (CANVAS). CANVAS presents enigmatic puzzles: The pathogenic path for CANVAS is unknown, nor is it understood why some, but not all expanded motifs are pathogenic. The most common pathogenic repeat is (AAGGG)n•(CCCTT)n, while (AAAAG)n•(CTTTT)n is the most common non-pathogenic motif. While both intronic motifs can be expanded and transcribed, only r(AAGGG)n is retained in the mutant RFC1 transcript. We show that only the pathogenic form unusual nucleic acid structures. Specifically, DNA and RNA of the pathogenic d(AAGGG)4 and r(AAGGG)4 form G-quadruplexes in potassium solution. Non-pathogenic repeats did not form G-quadruplexes. Triple-stranded structures formed by the pathogenic motifs, but not by the non-pathogenic motifs. G- and C-richness of the pathogenic strands, (AAGGG)n•(CCCTT)n, favor formation of G•G•G•G-tetrads and protonated C+-G Hoogsteen base pairings, involved in quadruplex and triplex structures, respectively, each biophysically stabilized by increased hydrogen-bonds and pi-stacking interactions relative to A-T Hoogsteen pairs that could form by the non-pathogenic motif (AAAAG)n•(CTTTT)n. The quadruplex ligand, TMPyP4 binds the pathogenic quadruplexes. Formation of quadruplexes and triplexes by pathogenic repeats supports toxic-DNA and toxic-RNA modes of pathogenesis at the RFC1 gene and the RFC1 transcript. Our findings with short repeats provide insights into the disease specificity of pathogenic repeat motif sequences and reveal nucleic acid structural features that may be pathogenically involved and targeted therapeutically.
Read full abstract