Abstract
BackgroundAccurate small molecule binding site information for a protein can facilitate studies in drug docking, drug discovery and function prediction, but small molecule binding site protein sequence annotation is sparse. The Small Molecule Interaction Database (SMID), a database of protein domain-small molecule interactions, was created using structural data from the Protein Data Bank (PDB). More importantly it provides a means to predict small molecule binding sites on proteins with a known or unknown structure and unlike prior approaches, removes large numbers of false positive hits arising from transitive alignment errors, non-biologically significant small molecules and crystallographic conditions that overpredict ion binding sites.DescriptionUsing a set of co-crystallized protein-small molecule structures as a starting point, SMID interactions were generated by identifying protein domains that bind to small molecules, using NCBI's Reverse Position Specific BLAST (RPS-BLAST) algorithm. SMID records are available for viewing at . The SMID-BLAST tool provides accurate transitive annotation of small-molecule binding sites for proteins not found in the PDB. Given a protein sequence, SMID-BLAST identifies domains using RPS-BLAST and then lists potential small molecule ligands based on SMID records, as well as their aligned binding sites. A heuristic ligand score is calculated based on E-value, ligand residue identity and domain entropy to assign a level of confidence to hits found. SMID-BLAST predictions were validated against a set of 793 experimental small molecule interactions from the PDB, of which 472 (60%) of predicted interactions identically matched the experimental small molecule and of these, 344 had greater than 80% of the binding site residues correctly identified. Further, we estimate that 45% of predictions which were not observed in the PDB validation set may be true positives.ConclusionBy focusing on protein domain-small molecule interactions, SMID is able to cluster similar interactions and detect subtle binding patterns that would not otherwise be obvious. Using SMID-BLAST, small molecule targets can be predicted for any protein sequence, with the only limitation being that the small molecule must exist in the PDB. Validation results and specific examples within illustrate that SMID-BLAST has a high degree of accuracy in terms of predicting both the small molecule ligand and binding site residue positions for a query protein.
Highlights
Accurate small molecule binding site information for a protein can facilitate studies in drug docking, drug discovery and function prediction, but small molecule binding site protein sequence annotation is sparse
Using Small Molecule Interaction Database (SMID)-BLAST, small molecule targets can be predicted for any protein sequence, with the only limitation being that the small molecule must exist in the Protein Data Bank (PDB)
Using SMID-BLAST, likely small molecule targets can be predicted for an arbitrary protein sequence, with the only limitation being that the small molecule must exist in the PDB in order to be predicted
Summary
SMID provides an extremely useful extrapolation of the small molecule interaction information implicitly stored in the PDB database. The SMID-BLAST validation results, coupled with the specific examples listed, illustrate that SMID-BLAST has a high degree of accuracy in terms of both identifying a small molecule ligand and predicting the binding site residue positions for a query protein This level of accuracy will only increase as more protein structures are deposited into the PDB and more interactions are computed for SMID. Since small molecule binding sites are generally highly conserved among members of a given protein family, structure templates can be chosen that possess a high degree of sequence similarity with the SMID-BLAST predicted binding site residues. Http://www.biomedcentral.com/1471-2105/7/152 appear to bind the same site on the query protein From this listing, an occupancy value of 1 is given to the small molecule hit with the greatest number of binding site residues. PDB identifiers and chains for the SMID-BLAST validation test set Click here for file [http://www.biomedcentral.com/content/supplementary/14712105-7-152-S4.txt]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.