R-loops are non-canonical, three-stranded nucleic acid structures composed of a DNA:RNA hybrid, a displaced single-stranded (ss)DNA, and a trailing ssRNA overhang. R-loops perform critical biological functions under both normal and disease conditions. To elucidate their cellular functions, we need to understand the mechanisms underlying R-loop formation, recognition, signaling, and resolution. Previous high-throughput screens identified multiple proteins that bind R-loops, with many of these proteins containing folded nucleic acid processing and binding domains that prevent (e.g., topoisomerases), resolve (e.g., helicases, nucleases), or recognize (e.g., KH, RRMs) R-loops. However, a significant number of these R-loop interacting Enzyme and Reader proteins also contain long stretches of intrinsically disordered regions (IDRs). The precise molecular and structural mechanisms by which the folded domains and IDRs synergize to recognize and process R-loops or modulate R-loop-mediated signaling have not been fully explored. While studying one such modular R-loop Reader, the Fragile X Protein (FMRP), we unexpectedly discovered that the C-terminal IDR (C-IDR) of FMRP is the predominant R-loop binding site, with the three N-terminal KH domains recognizing the trailing ssRNA overhang. Interestingly, the C-IDR of FMRP has recently been shown to undergo spontaneous Liquid-Liquid Phase Separation (LLPS) assembly by itself or in complex with another non-canonical nucleic acid structure, RNA G-quadruplex. Furthermore, we have recently shown that FMRP can suppress persistent R-loops that form during transcription, a process that is also enhanced by LLPS via the assembly of membraneless transcription factories. These exciting findings prompted us to explore the role of IDRs in R-loop processing and signaling proteins through a comprehensive bioinformatics and computational biology study. Here, we evaluated IDR prevalence, sequence composition and LLPS propensity for the known R-loop interactome. We observed that, like FMRP, the majority of the R-loop interactome, especially Readers, contains long IDRs that are highly enriched in low complexity sequences with biased amino acid composition, suggesting that these IDRs could directly interact with R-loops, rather than being “mere flexible linkers” connecting the “functional folded enzyme or binding domains”. Furthermore, our analysis shows that several proteins in the R-loop interactome are either predicted to or have been experimentally demonstrated to undergo LLPS or are known to be associated with phase separated membraneless organelles. Thus, our overall results present a thought-provoking hypothesis that IDRs in the R-loop interactome can provide a functional link between R-loop recognition via direct binding and downstream signaling through the assembly of LLPS-mediated membrane-less R-loop foci. The absence or dysregulation of the function of IDR-enriched R-loop interactors can potentially lead to severe genomic defects, such as the widespread R-loop-mediated DNA double strand breaks that we recently observed in Fragile X patient-derived cells.
Read full abstract