Abstract

Net charge, electric dipole moment and quadrupole moment tensors were calculated for 78 amino acid sequences from 62 representative DNA-binding proteins with known structures. It was found that the magnitudes of the moments of electric charge distribution in these chains differ significantly from those of a non-binding control data set. Net charge, net dipole moment and quadrupole moment could each distinguish binding and non-binding proteins with 82.6%, 77.4% and 73.7% accuracy by single-variable predictors without cross-validation. Using hybrid predictors with information of charge and both moments, the best predictions were 85.6% without cross-validation and 83.9% for the cross-validated data sets. This level of prediction accuracy obtained with these simple descriptors competes with the results obtained using more complex models including many descriptors. The coarse graining of atomic charges onto C(alpha) atoms did not reduce the prediction accuracy significantly. This result suggests that we can use C(alpha) coordinates derived from homology modeling to predict DNA-binding proteins. The speed and accuracy of this method, in combination with homology-based methods of structure prediction, should enhance genome-wide recognition of DNA-binding proteins.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call