Abstract

RNA-binding proteins (RBPs) play important roles in the post-transcriptional control of RNAs. Identifying RBP binding sites and characterizing RBP binding preferences are key steps toward understanding the basic mechanisms of the post-transcriptional gene regulation. Though numerous computational methods have been developed for modeling RBP binding preferences, discovering a complete structural representation of the RBP targets by integrating their available structural features in all three dimensions is still a challenging task. In this paper, we develop a general and flexible deep learning framework for modeling structural binding preferences and predicting binding sites of RBPs, which takes (predicted) RNA tertiary structural information into account for the first time. Our framework constructs a unified representation that characterizes the structural specificities of RBP targets in all three dimensions, which can be further used to predict novel candidate binding sites and discover potential binding motifs. Through testing on the real CLIP-seq datasets, we have demonstrated that our deep learning framework can automatically extract effective hidden structural features from the encoded raw sequence and structural profiles, and predict accurate RBP binding sites. In addition, we have conducted the first study to show that integrating the additional RNA tertiary structural features can improve the model performance in predicting RBP binding sites, especially for the polypyrimidine tract-binding protein (PTB), which also provides a new evidence to support the view that RBPs may own specific tertiary structural binding preferences. In particular, the tests on the internal ribosome entry site (IRES) segments yield satisfiable results with experimental support from the literature and further demonstrate the necessity of incorporating RNA tertiary structural information into the prediction model. The source code of our approach can be found in https://github.com/thucombio/deepnet-rbp.

Highlights

  • RNA-binding proteins (RBPs) play important roles in various cellular processes, such as alternative splicing, RNA editing, mRNA localization and translational regulation [1]

  • We validated the performance of our deep learning framework on 24 datasets of the HITS-CLIP, PAR-CLIP- and iCLIP-derived RBP binding sites, in which 23 datasets were derived from doRiNA [38], and the remaining one which measured the PTB binding sites by HITS-CLIP was derived from [39]

  • MDBN- stands for the multimodal deep belief network (DBN) that only integrates the RNA base sequence and secondary structural profiles, while mDBN+ stands for the framework that integrates the RNA base sequence, secondary and tertiary structural profiles

Read more

Summary

Introduction

RNA-binding proteins (RBPs) play important roles in various cellular processes, such as alternative splicing, RNA editing, mRNA localization and translational regulation [1]. RBPs contain several special RNA-binding domains (RBDs), e.g. the RNA recognition motif (RRM) and the hnRNP K-homology (KH) domains, which recognize their target sites related to the RNA primary sequence and the corresponding structural profiles [2]. Identifying RNA–protein interactions and modeling RBP binding preferences are important for decoding the posttranscriptional processes involving RBPs and their mechanisms of pathogenesis in human diseases. The advent of high-throughput experimental methods, such as the cross-linking immunoprecipitation coupled with high-throughput sequencing (CLIP-seq) protocols, has greatly advanced the genome-wide studies of RNA–protein interactions [5,6,7,8]. Despite the success stories of these experimental techniques, the collected data still suffer from the false-positive and false-negative problems due

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call