Abstract

Accurate prediction of the DNA-binding protein (DBP) helps to uncover the underlying mechanism of DNA-protein interaction, thus contributing to the understanding of many biochemical processes. Although the computational methods have made great process, there is still much room for improving the DBP prediction. In this study, a lightweight and interpretable sequence-based method, named LBi-DBP, is well-designed to improve the performance of DBP prediction. In LBi-DBP, five sequence feature sources are first employed to extract important sequence context information using a slight bidirectional long short-term memory (BiLSTM)-based neural network module; then, a multi-layer perceptron (MLP) module is utilized to learn the optimal/sub-optimal parameters with sequence context information extracted by BiLSTM module and pseudo sequence order feature as inputs. Experimental results on two independent test sets, i.e., UniSwiss-Tst and PDB2272, demonstrate that LBi-DBP can achieve Matthew’s correlation coefficient values of 0.762 and 0.574 respectively, which are both higher than those of the second-best existing DBP prediction method (0.741 and 0.424). Meanwhile, good prediction result on a disease-related test set also shows the value of LBi-DBP. Through interpretability analysis, we find that the BiLSTM module can focus on the residues in real native DNA-binding domain region without prior knowledge, thus improving DNA-binding protein prediction performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call