wwPDB Research Articles

BackgroundChemogenomics is an emerging inter-disciplinary approach to drug discovery that combines traditional ligand-based approaches with biological information on drug targets and lies at the interface of chemistry, biology and informatics. The ultimate goal in chemogenomics is to understand molecular recognition between all possible ligands and all possible drug targets. Protein and ligand space have previously been studied as separate entities, but chemogenomics studies deal with large datasets that cover parts of the joint protein-ligand space. Since drug discovery has traditionally focused on ligand optimization, the chemical space has been studied extensively. The protein space has been studied to some extent, typically for the purpose of classification of proteins into functional and structural classes. Since chemogenomics deals not only with ligands but also with the macromolecules the ligands interact with, it is of interest to find means to explore, compare and visualize protein-ligand subspaces.ResultsTwo chemogenomics protein-ligand interaction datasets were prepared for this study. The first dataset covers the known structural protein-ligand space, and includes all non-redundant protein-ligand interactions found in the worldwide Protein Data Bank (PDB). The second dataset contains all approved drugs and drug targets stored in the DrugBank database, and represents the approved drug-drug target space. To capture biological and physicochemical features of the chemogenomics datasets, sequence-based descriptors were computed for the proteins, and 0, 1 and 2 dimensional descriptors for the ligands. Principal component analysis (PCA) was used to analyze the multidimensional data and to create global models of protein-ligand space. The nearest neighbour method, computed using the principal components, was used to obtain a measure of overlap between the datasets.ConclusionIn this study, we present an approach to visualize protein-ligand spaces from a chemogenomics perspective, where both ligand and protein features are taken into account. The method can be applied to any protein-ligand interaction dataset. Here, the approach is applied to analyze the structural protein-ligand space and the protein-ligand space of all approved drugs and their targets. We show that this approach can be used to visualize and compare chemogenomics datasets, and possibly to identify cross-interaction complexes in protein-ligand space.

We are pleased to announce that a set of guidelines which supplement the International Union of Pure and Applied Chemistry’s (IUPAC) Recommendations for the Presentation of NMR Structures of Proteins and Nucleic Acids [J. L. Markley et al. (1998), J. Mol. Biol. 280, 933–952] are now available from our web pages at http://journals.iucr.org/f/services/structuralcommunications/. The guidelines will be used by the referees and editors of Acta Crystallographica Section F to evaluate future submissions of structure reports determined by use of NMR spectroscopy until such time as a more comprehensive review of standards is reported. These guidelines for NMR data are mentioned in the 2008 Notes for authors which appear in this issue, and in the coming weeks they will be expanded and annotated with examples of good publication practice. ‘Where are the submissions of structure reports which have been determined by use of NMR?’ you may ask. In reply, we point to our editorial in June 2006, which announced the publication in that issue of Solution structure of Arabidopsis thaliana protein At5g39720.1, a member of the AIG2-like protein family [B. L. Lytle et al. (2006), Acta Cryst. F62, 490–493]. Publication of NMR structure reports in Acta Cryst. F underscores the journal’s commitment to serve the structural genomics community and we look forward to other submissions in the future. At the same time, the journals of the International Union of Crystallography (IUCr) continue to hold to the highest standards of review and reproducibility in its crystallographic papers and this update expresses the commitment that those same high standards be maintained for its NMR papers as well. The guidelines were drafted on 21 July 2007 at a workshop, whose title we chose for this editorial, which was held as part of the program of the American Crystallographic Association (ACA) meeting in Salt Lake City, Utah. The workshop was sponsored by the IUCr with additional support from Bruker BioSpin Corp., Cambridge Isotope Labs and Varian, Inc. Representatives from the NMR structure community, IUCr Journals, the relevant databases, Worldwide Protein Data Bank (wwPDB) and Biological Magnetic Resonance Data Bank (BMRB), and NMR instrumentation participated. We thank the ACA and the meeting organizers for hosting this productive workshop, the IUCr and the NMR support companies for their generous financial support and the participants, who worked together so congenially and productively to produce this important result. The workshop it seems has anticipated an emergent effort of the NMR community itself to update the 1998 IUPAC standards in order to reflect progress in technical developments and best practices. It is an effort of the NMR Task Force, which is an advisory body of the wwPDB that provides advice and recommendations on standards for submission of NMR-derived biomolecular structures and supporting data. Over the years, the aims and goals of the PDB and the journals serving structural biology have coalesced around a number of issues of mutual concern and they have worked together to good effect on behalf of the structural biology community. The work the NMR Task Force has undertaken is another example of the continuing good service the PDB and its leadership have provided structural biology in anticipating its needs and undertaking to satisfy them. We await with interest the report of the NMR Task Force and wish them success in their deliberations. If we have been in some way helpful to their effort, we are pleased to have done so.

wwPDB Research Articles

Related Topics

Articles published on wwPDB

A chemogenomics view on protein-ligand spaces.

Data Deposition and Annotation at the Worldwide Protein Data Bank

Development of Protein Structure Databases and their Applications to Functional Annotation

Topological Classification of RNA Structures

Protein structure databases with new web services for structural biology and biomedical research

BioMagResBank (BMRB) as a partner in the Worldwide Protein Data Bank (wwPDB): new policies affecting biomolecular NMR depositions.

Validation of macromolecular structures: updating standards for publication of NMR structures in an IUCr journal

BioMagResBank.

Remediation of the protein data bank archive.

A global analysis of NMR distance constraints from the PDB

A comparative view at comprehensive information resources on three-dimensional structures of biological macro-molecules

The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data

Using MSDchem to Search the PDB Ligand Dictionary

E-MSD: improving data deposition and structure quality

E-MSD: an integrated data resource for bioinformatics

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

wwPDB Research Articles

Related Topics

Articles published on wwPDB

A chemogenomics view on protein-ligand spaces.

Data Deposition and Annotation at the Worldwide Protein Data Bank

Development of Protein Structure Databases and their Applications to Functional Annotation

Topological Classification of RNA Structures

Protein structure databases with new web services for structural biology and biomedical research

BioMagResBank (BMRB) as a partner in the Worldwide Protein Data Bank (wwPDB): new policies affecting biomolecular NMR depositions.

Validation of macromolecular structures: updating standards for publication of NMR structures in an IUCr journal

BioMagResBank.

Remediation of the protein data bank archive.

A global analysis of NMR distance constraints from the PDB

A comparative view at comprehensive information resources on three-dimensional structures of biological macro-molecules

The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data

Using MSDchem to Search the PDB Ligand Dictionary

E-MSD: improving data deposition and structure quality

E-MSD: an integrated data resource for bioinformatics