Prediction of 3D metal binding sites from translated gene sequences based on remote‐homology templates

Ronen Levy,Vladimir Sobolev,Marvin Edelman

doi:10.1002/prot.22352

Abstract

Database-scale analysis was performed to determine whether structural models, based on remote homologues, are effective in predicting 3D transition metal binding sites in proteins directly from translated gene sequences. The extent by which side chain modeling alone reduces sensitivity and selectivity is shown to be <10%. Surprisingly, selectivity was not dependent on the level of sequence homology between template and target, or on the presence of a metal ion in the structural template. Applying a modification of the CHED algorithm (Babor et al., Proteins 2008;70:208-217) and machine learning filters, a selectivity of approximately 90% was achieved for protein sequences using unrelated structural templates over a sequence identity range of 18-100%. Below approximately 18% identity, the number of analyzable target-template pairs and predictability of metal binding sites falls off sharply. A full third of structural templates were found to have target partners only in the remote homology range of 18-30%. In this range, nonmetal-binding templates are calculated to be the majority and serve to predict with 50% sensitivity at the geometric level. Overall, sensitivity at the geometric level for targets having templates in the 18-30% sequence identity range is 73%, with an average of one false positive site per true site. Protein sequences described as "unknown" in the UniProt database and composed largely of unidentified genome project sequences were studied and metal binding sites predicted. A web server for prediction of metal binding sites from protein sequence is provided.

Full Text