Abstract

Protein-RNA interactions play essential roles in many biological aspects. Quantifying the binding affinity of protein-RNA complexes is helpful to the understanding of protein-RNA recognition mechanisms and identification of strong binding partners. Due to experimentally measured protein-RNA binding affinity data available is still limited to date, there is a pressing demand for accurate and reliable computational approaches. In this paper, we propose a computational approach, PredPRBA, which can effectively predict protein-RNA binding affinity using gradient boosted regression trees. We build a dataset of protein-RNA binding affinity that includes 103 protein-RNA complex structures manually collected from related literature. Then, we generate 37 kinds of sequence and structural features and explore the relationship between the features and protein-RNA binding affinity. We find that the binding affinity mainly depends on the structure of RNA molecules. According to the type of RNA associated with proteins composed of the protein-RNA complex, we split the 103 protein-RNA complexes into six categories. For each category, we build a gradient boosted regression tree (GBRT) model based on the generated features. We perform a comprehensive evaluation for the proposed method on the binding affinity dataset using leave-one-out cross-validation. We show that PredPRBA achieves correlations ranging from 0.723 to 0.897 among six categories, which is significantly better than other typical regression methods and the pioneer protein-RNA binding affinity predictor SPOT-Seq-RNA. In addition, a user-friendly web server has been developed to predict the binding affinity of protein-RNA complexes. The PredPRBA webserver is freely available at http://PredPRBA.denglab.org/.

Highlights

  • Protein-RNA interactions play a crucial role in many biological processes, such as gene expression and its regulation (Keene, 2007; Glisovic et al, 2008)

  • Tuszynska and Bujnicki (2011) published two knowledge-based scoring functions that were tested on eight unbound protein-RNA docking baits produced by the GRAMM program

  • Li et al (2012) raised a question about the propensity of residues-nucleotides, and they found that the secondary structure of RNA plays a crucial role in predicting residue nucleotide propensity potential

Read more

Summary

Introduction

Protein-RNA interactions play a crucial role in many biological processes, such as gene expression and its regulation (Keene, 2007; Glisovic et al, 2008). Tuszynska and Bujnicki (2011) published two knowledge-based scoring functions that were tested on eight unbound protein-RNA docking baits produced by the GRAMM program. Their results showed that these potentials were identified near the natural structure in four of the eight samples. The protein-RNA docking benchmark dataset has been widely used to develop computational methods for studying proteinRNA interactions, including docking (Guilhot-Gaudeffroy et al, 2014; Guo et al, 2013; Iwakiri et al, 2016) and knowledge-based scoring functions (Huang and Zou, 2014; Yan and Wang, 2013) for the prediction of RNA binding sites in protein structures (Miao and Westhof, 2015), role of water molecules at the protein-RNA interface (Barik and Bahadur, 2014), and discovery of binding hotspots at the protein-RNA interface (Barik et al, 2015)

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.