Abstract

Sequence-specific interactions of RNA-binding proteins (RBPs) with their target transcripts are essential for post-transcriptional gene expression regulation in mammals. However, accurate prediction of RBP motif sites has been difficult because many RBPs recognize short and degenerate sequences. Here we describe a hidden Markov model (HMM)-based algorithm mCarts to predict clustered functional RBP-binding sites by effectively integrating the number and spacing of individual motif sites, their accessibility in local RNA secondary structures and cross-species conservation. This algorithm learns and quantifies rules of these features, taking advantage of a large number of in vivo RBP-binding sites obtained from cross-linking and immunoprecipitation data. We applied this algorithm to study two representative RBP families, Nova and Mbnl, which regulate tissue-specific alternative splicing through interacting with clustered YCAY and YGCY elements, respectively, and predicted their binding sites in the mouse transcriptome. Despite the low information content in individual motif elements, our algorithm made specific predictions for successful experimental validation. Analysis of predicted sites also revealed cases of extensive and distal RBP-binding sites important for splicing regulation. This algorithm can be readily applied to other RBPs to infer their RNA-regulatory networks. The software is freely available at http://zhanglab.c2b2.columbia.edu/index.php/MCarts.

Highlights

  • Mammals express hundreds of RNA-binding proteins (RBPs) interacting with specific target transcripts even in a single tissue like brain [1]

  • To map sites of protein–RNA interactions in an unbiased manner, we previously developed a biochemical assay named cross-linking and immunoprecipitation (CLIP) to isolate RNA fragments that are directly bound by an RBP [29,30]

  • We describe a hidden Markov model (HMM)-based algorithm and software tool, named mCarts that takes advantage of massive HITS-CLIP datasets to learn models of RBPbinding sites optimized for global prediction

Read more

Summary

Introduction

Mammals express hundreds of RNA-binding proteins (RBPs) interacting with specific target transcripts even in a single tissue like brain [1]. Different strategies seem to have been used by RBPs to achieve sufficient targeting specificity [18], including coexpression of RBPs and their substrate transcripts in specific temporal or spatial windows to limit the search space, and cooperative binding of different RBPs to proximal sites (i.e. RNA motif modules) to stabilize each other. Another important mechanism is the additive or synergistic binding of multiple RNA-binding domains (RBDs) of an RBP [19].

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.