The advent of high-throughput technologies has led to the identification of many new in vivo protein-RNA interactions. While much work has been done to characterize these interactions based on secondary structures of RNA, less effort has been placed to identify three-dimensional (3D) motifs of RNA that are essential for protein-RNA recognition. This discrepancy in effort is not for lack of interest, but rather arises due to the significant challenges and complexities in modeling protein-RNA complexes in 3D. The challenges, in general, are twofold: i) sampling possible protein-RNA interactions and ii) ranking models based on their likelihood to exist in the cell; this process is also known as “scoring”. The former is often intractable - the flexibility of RNA adds significant complexity to the modeling process - but sampling search space can be reduced with experimental input of possible interacting regions (e.g. from HITS-CLIP, PAR-CLIP or iCLIP data). We designed a new 3D modeling protocol to generate reasonable models of protein-RNA complexes, by accounting for protein flexibility (using torsional angle sampling) and RNA flexibility (using previously developed “natural moves” for RNA). Using this protocol, we are able to obtain native-like protein-RNA interactions, even when modeling RNA structures starting from the unbound conformation. As expected, including experimental constraints significantly improves sampling efficiency. Because our protein-RNA models are robust and diverse, they serve as benchmark decoy datasets for evaluating protein-RNA scoring functions - the second aforementioned challenge. With our decoy set we are able to identify constructive features of different scoring functions, and also highlight limitations requiring further development. With improved scoring functions and known interacting regions probed from experiments, we foresee that our protocol can be used to effectively model novel protein-RNA complexes in 3D.
Read full abstract