Abstract

Accurately identifying the protein DNA-binding residues is important for understanding the protein–DNA recognition mechanism and protein function annotation. Many computational methods have been proposed to predict protein–DNA binding residues using protein sequence information; however, for severe imbalanced data like the protein–DNA binding dataset, the under-sampling technique which is applied by most previous methods cannot achieve satisfactory performance. In this study, an adjustment algorithm is proposed to offset the biased prediction results from the classifier. The proposed adjustment algorithm uses the binding probability between the target residue and its neighboring residues to identify more true binding residues which are wrongly predicted as non-binding. The proposed prediction method with adjustment algorithm achieves an area under the ROC curve (AUC) of 0.926 and 0.866 on two benchmark datasets and 0.882 on the independent testing set, which demonstrates that the proposed method can efficiently predict specific residues for protein–DNA interactions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call