Abstract

Protein-nucleic acid interactions play critical roles in many biological processes. Quantifying the binding affinity of protein-nucleic acid complexes is helpful to the understanding of protein-nucleic acid recognition mechanism and identification of reliable binding partners. In this paper, we propose a computational approach, PNAB, which can effectively predict protein-nucleic acid binding affinity using heterogeneous ensemble models based on sequence. We build a dataset of protein-nucleic acid binding affinity that includes 103 protein-RNA complex and 100 protein-DNA complexes manually collected from related literature. We find that the binding affinity mainly depends on the structure of nucleic acid molecules. According to the type of nucleic acid associated with proteins composed of the protein-nucleic acid complex, we classify the complexes divide all the complexes into 11 categories (six classes for protein-RNA complexes and five classes for protein-DNA complexes). Then, we extract sequence features from the protein-nucleic acid complexes and build a stacking heterogeneous ensemble model based on the generated features for each category. We perform a comprehensive evaluation for the proposed method on the binding affinity dataset using leave-one-out cross-validation, and we show that PNAB achieves correlations ranging from 0.84 to 0.95 among all of the categories, which is significantly better than other typical regression methods and the pioneer protein-nucleic acid binding affinity predictor. Also, a user-friendly web server has been developed to predict the binding affinity of protein-RNA complexes. The PNAB web server is freely available at http://pnab.denglab.org/.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call