G-quadruplexes (G4s) are distinctive four-stranded DNA or RNA structures found within cells that are thought to play functional roles in gene regulation and transcription, translation, recombination, and DNA damage/repair. While G4 structures can be uni-, bi-, or tetramolecular with respect to strands, folded unimolecular conformations are most significant in vivo. Unimolecular G4 can potentially form in sequences with runs of guanines interspersed with what will become loops in the folded structure: 5'GxLyGxLyGxLyGx, where x is typically 2-4 and y is highly variable. Such sequences are highly conserved and specifically located in genomes. In the folded structure, guanines from each run combine to form planar tetrads with four hydrogen-bonded guanine bases; these tetrads stack on one another to produce four strand segments aligned in specific parallel or antiparallel orientations, connected by the loop sequences. Three types of loops (lateral, diagonal, or "propeller") have been identified. The stacked tetrads form a central cavity that features strong coordination sites for monovalent cations that stabilize the G4 structure, with potassium or sodium preferred. A single monomeric G4 typically forms from a sequence containing roughly 20-30 nucleotides. Such short sequences have been the primary focus of X-ray crystallographic or NMR studies that have produced high-resolution structures of a variety of monomeric G4 conformations. These structures are often used as the basis for drug design efforts to modulate G4 function.We believe that the focus on monomeric G4 structures formed by such short sequences is perhaps myopic. Such short sequences for structural studies are often arbitrarily selected and removed from their native genomic sequence context, and then are often changed from their native sequences by base substitutions or deletions intended to optimize the formation of a homogeneous G4 conformation. We believe instead that G-quadruplexes prefer company and that in a longer natural sequence context multiple adjacent G4 units can form to combine into more complex multimeric G4 structures with richer topographies than simple monomeric forms. Bioinformatic searches of the human genome show that longer sequences with the potential for forming multiple G4 units are common. Telomeric DNA, for example, has a single-stranded overhang of hundreds of nucleotides with the requisite repetitive sequence with the potential for formation of multiple G4s. Numerous extended promoter sequences have similar potentials for multimeric G4 formation. X-ray crystallography and NMR methods are challenged by these longer sequences (>30 nt), so other tools are needed to explore the possible multimeric G4 landscape. We have implemented an integrated structural biology approach to address this challenge. This approach integrates experimental biophysical results with atomic-level molecular modeling and molecular dynamics simulations that provide quantitatively testable model structures. In every long sequence we have studied so far, we found that multimeric G4 structures readily form, with a surprising diversity of structures dependent on the exact native sequence used. In some cases, stable hairpin duplexes form along with G4 units to provide an even richer landscape. This Account provides an overview of our approach and recent progress and provides a new perspective on the G-quadruplex folding landscape.
Read full abstract