Abstract

R-loops are non-canonical, three-stranded nucleic acid structures composed of a DNA:RNA hybrid, a displaced single-stranded (ss)DNA, and a trailing ssRNA overhang. R-loops perform critical biological functions under both normal and disease conditions. To elucidate their cellular functions, we need to understand the mechanisms underlying R-loop formation, recognition, signaling, and resolution. Previous high-throughput screens identified multiple proteins that bind R-loops, with many of these proteins containing folded nucleic acid processing and binding domains that prevent (e.g., topoisomerases), resolve (e.g., helicases, nucleases), or recognize (e.g., KH, RRMs) R-loops. However, a significant number of these R-loop interacting Enzyme and Reader proteins also contain long stretches of intrinsically disordered regions (IDRs). The precise molecular and structural mechanisms by which the folded domains and IDRs synergize to recognize and process R-loops or modulate R-loop-mediated signaling have not been fully explored. While studying one such modular R-loop Reader, the Fragile X Protein (FMRP), we unexpectedly discovered that the C-terminal IDR (C-IDR) of FMRP is the predominant R-loop binding site, with the three N-terminal KH domains recognizing the trailing ssRNA overhang. Interestingly, the C-IDR of FMRP has recently been shown to undergo spontaneous Liquid-Liquid Phase Separation (LLPS) assembly by itself or in complex with another non-canonical nucleic acid structure, RNA G-quadruplex. Furthermore, we have recently shown that FMRP can suppress persistent R-loops that form during transcription, a process that is also enhanced by LLPS via the assembly of membraneless transcription factories. These exciting findings prompted us to explore the role of IDRs in R-loop processing and signaling proteins through a comprehensive bioinformatics and computational biology study. Here, we evaluated IDR prevalence, sequence composition and LLPS propensity for the known R-loop interactome. We observed that, like FMRP, the majority of the R-loop interactome, especially Readers, contains long IDRs that are highly enriched in low complexity sequences with biased amino acid composition, suggesting that these IDRs could directly interact with R-loops, rather than being “mere flexible linkers” connecting the “functional folded enzyme or binding domains”. Furthermore, our analysis shows that several proteins in the R-loop interactome are either predicted to or have been experimentally demonstrated to undergo LLPS or are known to be associated with phase separated membraneless organelles. Thus, our overall results present a thought-provoking hypothesis that IDRs in the R-loop interactome can provide a functional link between R-loop recognition via direct binding and downstream signaling through the assembly of LLPS-mediated membrane-less R-loop foci. The absence or dysregulation of the function of IDR-enriched R-loop interactors can potentially lead to severe genomic defects, such as the widespread R-loop-mediated DNA double strand breaks that we recently observed in Fragile X patient-derived cells.

Highlights

  • Co-transcriptional R-loops are widespread and functional noncanonical nucleic acid structures (Santos-Pereira and Aguilera, 2015; Crossley et al, 2019; García-Muse and Aguilera, 2019; Hegazy et al, 2020)

  • We found that the IDRs of the R-loop interactome contain low complexity sequences with heavy biases towards a few residues (Glu, Ser, Lys, Pro, Gly, Ala, and Arg), with the IDRs of the R-loop Readers being enriched in Gly, Ser, Arg, and Pro residues and the IDRs of the Enzymes enriched in Glu, Lys, Arg, and Ser

  • We present a provoking hypothesis that these IDRs could be the predominant sites for interaction with R-loops, as we recently discovered for the C-IDR of FMRP

Read more

Summary

Introduction

Co-transcriptional R-loops are widespread and functional noncanonical nucleic acid structures (Santos-Pereira and Aguilera, 2015; Crossley et al, 2019; García-Muse and Aguilera, 2019; Hegazy et al, 2020). For instance, R-loops occupy as much as 5% of the genome, usually at promoter and terminator regions as well as at ribosomal DNA and transfer RNA gene regions (Sanz et al, 2016). Understanding the mechanisms of R-loop formation and interaction, and the processes that regulate or are regulated by R-loops is an important first step for determining the cellular functions of R-loops. Unravelling the structural and binding mechanisms utilized by proteins that are involved in the regulation of R-loop formation, prevention and resolution, as well as understanding how these cellular processes are dysregulated in pathological conditions, is vital for developing novel therapeutics to target the biological functions of R-loops

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call