Abstract

Hot spot residues at protein–RNA complexes are vitally important for investigating the underlying molecular recognition mechanism. Accurately identifying protein–RNA binding hot spots is critical for drug designing and protein engineering. Although some progress has been made by utilizing various available features and a series of machine learning approaches, these methods are still in the infant stage. In this paper, we present a new computational method named XGBPRH, which is based on an eXtreme Gradient Boosting (XGBoost) algorithm and can effectively predict hot spot residues in protein–RNA interfaces utilizing an optimal set of properties. Firstly, we download 47 protein–RNA complexes and calculate a total of 156 sequence, structure, exposure, and network features. Next, we adopt a two-step feature selection algorithm to extract a combination of 6 optimal features from the combination of these 156 features. Compared with the state-of-the-art approaches, XGBPRH achieves better performances with an area under the ROC curve (AUC) score of 0.817 and an F1-score of 0.802 on the independent test set. Meanwhile, we also apply XGBPRH to two case studies. The results demonstrate that the method can effectively identify novel energy hotspots.

Highlights

  • IntroductionProtein–RNA interaction site prediction is of great significance and helps us understand how protein function is achieved so that we can better understand and study the various features of cells [1,2,3,4,5]

  • The proteins and nucleic acids constitute the two most important types of biological diversity in living organisms, and they each have their structural characteristics and particular fixed function.Protein–RNA interaction site prediction is of great significance and helps us understand how protein function is achieved so that we can better understand and study the various features of cells [1,2,3,4,5].Among the protein–RNA interface residues, as is known to all, only a small number of hot spots are essential for the binding free energy

  • F-score features were generated from four sources of information: network, exposure, structure, of each feature on the training dataset using XGBoost with 10-fold cross-validation

Read more

Summary

Introduction

Protein–RNA interaction site prediction is of great significance and helps us understand how protein function is achieved so that we can better understand and study the various features of cells [1,2,3,4,5]. Among the protein–RNA interface residues, as is known to all, only a small number of hot spots are essential for the binding free energy. Sufficient identification of these hot spots helps to better understand the molecular mechanisms. The interactions of protein with small molecule compounds are the basis for drug design, and structure-based drug design has achieved great success in the development of drugs [6,7]. The precise localization of hot spots can elucidate the principle of protein–RNA interactions and provide a very significant theoretical support and a basis for target drug preparation

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call