Validation of the data deposited in the Protein Data Bank is of the upmost importance, since many other databases, data mining processes, and artificial intelligence tools are strictly grounded on them. The present paper is divided into two parts. The first part describes and analyzes validation methods that have been designed and used by the structural biology community. Everything began with the Ramachandran plot, with its allowed and disallowed types of backbone conformations, and evolved in different directions, with the inclusion of additional stereochemical features, distributions’ analyses of structural moieties, and scrutiny of structure factor amplitudes across the reciprocal lattice. The second part of the paper is focused on the largely unexplored problem of the high number of false positives amongst the sodium(I) cations observed in protein crystal structures. It is demonstrated that these false positives, which are atoms wrongly identified with sodium, can be identified by using electrostatic considerations and it is anticipated that this approach can be extended to other alkali and alkaline earth cations or to monoatomic anions. In the end, I think a global initiative, accessible to all volunteers and possibly overseen by the Protein Data Bank, should take the place of the numerous web servers and software applications by providing the community with a select few reliable and widely accepted tools.
Read full abstract