Abstract

A residue specific approach is introduced for enhancing the accuracy of knowledge-based statistical potentials for quality assessment of protein models. The proposed method relies on an assumption that common substructure motifs among different protein folds can still share similar patterns of interactions with neighboring residues. Instead of using the classic approach, in which a set of non-homologous whole-chain structures are used to derive a potential, we created a potential that was composed of independent sub-potentials; each of them was specific to a certain residue in the target sequence. To achieve that, the target sequence was split into short linear segments, then every segment was threaded through PDB to find proteins that share sequence similarity with this segment. Then, after removal the hits that were homologous to the target sequence, the remaining proteins were used to derive a statistical sub-potential for the residue that was in the center of a certain segment that had been threaded through PDB. This procedure was performed for every residue in the target protein. We applied this methodology to create a residue-specific variant of the DFIRE statistical potential. For CASP9 single-domain targets, the average Pearson's correlation coefficient per target between the pseudo-energy predicted by the residue-specific DIFIRE potential and the GDT_TS score was 0.656. The classic DFIRE potential achieved the correlation coefficient of 0.561. We believe that, for the current size (which is still growing) of the PDB database, this methodology can be successfully applied to increase the accuracy of other state-of-the-art statistical potentials.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call