Partial Order Optimum Likelihood (POOL): Maximum Likelihood Prediction of Protein Active Site Residues Using 3D Structure and Sequence Properties

Wenxu Tong,Leonel F Murga,Mary Jo Ondrechen,Ying Wei,Ronald J Williams

doi:10.1371/journal.pcbi.1000266

Wenxu Tong, Leonel F Murga + Show 3 more

Open Access

https://doi.org/10.1371/journal.pcbi.1000266

Copy DOI

Journal: PLoS Computational Biology	Publication Date: Jan 16, 2009
Citations: 100	License type: CC BY 4.0

Affiliation: Northeastern University

Abstract

A new monotonicity-constrained maximum likelihood approach, called Partial Order Optimum Likelihood (POOL), is presented and applied to the problem of functional site prediction in protein 3D structures, an important current challenge in genomics. The input consists of electrostatic and geometric properties derived from the 3D structure of the query protein alone. Sequence-based conservation information, where available, may also be incorporated. Electrostatics features from THEMATICS are combined with multidimensional isotonic regression to form maximum likelihood estimates of probabilities that specific residues belong to an active site. This allows likelihood ranking of all ionizable residues in a given protein based on THEMATICS features. The corresponding ROC curves and statistical significance tests demonstrate that this method outperforms prior THEMATICS-based methods, which in turn have been shown previously to outperform other 3D-structure-based methods for identifying active site residues. Then it is shown that the addition of one simple geometric property, the size rank of the cleft in which a given residue is contained, yields improved performance. Extension of the method to include predictions of non-ionizable residues is achieved through the introduction of environment variables. This extension results in even better performance than THEMATICS alone and constitutes to date the best functional site predictor based on 3D structure only, achieving nearly the same level of performance as methods that use both 3D structure and sequence alignment data. Finally, the method also easily incorporates such sequence alignment data, and when this information is included, the resulting method is shown to outperform the best current methods using any combination of sequence alignments and 3D structures. Included is an analysis demonstrating that when THEMATICS features, cleft size rank, and alignment-based conservation scores are used individually or in combination THEMATICS features represent the single most important component of such classifiers.

Highlights

Development of function prediction capabilities is a major challenge in genomics
As described in more detail in the Materials and Methods section, the results presented in this paper are based on two sets of proteins, a set of 64 test proteins selected randomly from the Catalytic Site Atlas (CSA) database [16,17] and a 160-protein set covering most of the original CSA database
We presented the application of the Partial Order Optimum Likelihood (POOL) method using THEMATICS plus some other features for protein active site prediction

Summary

Introduction

Development of function prediction capabilities is a major challenge in genomics. Structural genomics projects are determining the 3D structures of expressed proteins on a high throughput basis. The determination of function from 3D structure has proved to be a challenging task; the functions of most of these structural genomics proteins remain unknown. Based predictive methods can help to guide and accelerate functional annotation. The first step toward the prediction of the function of a protein from its 3D structure is to determine its local site of interaction where catalysis and/or ligand recognition occurs. Such capabilities have many important practical implications for biology and medicine

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Partial Order Optimum Likelihood (POOL): Maximum Likelihood Prediction of Protein Active Site Residues Using 3D Structure and Sequence Properties

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS Computational Biology

Lead the way for us

Similar Papers

Pocketome via Comprehensive Identification and Classification of Ligand Binding Envelopes
Jianghong An ... Ruben Abagyan
Molecular & Cellular Proteomics | VOL. 4
Jianghong An, et. al.Jianghong An ... Ruben Abagyan
01 Jun 2005
Molecular & Cellular Proteomics | VOL. 4

A Trial to Predict Interactions between Proteins and Biomolecules Based on Their Three-dimensional Structures
Kei Yura
YAKUGAKU ZASSHI | VOL. 128
Kei YuraKei Yura
01 Nov 2008
YAKUGAKU ZASSHI | VOL. 128

Big data and artificial intelligence discover novel drugs targeting proteins without 3D structure and overcome the undruggable targets
Huiqin He ... Jingwei Jiang
Stroke and Vascular Neurology | VOL. 5
Huiqin He, et. al.Huiqin He ... Jingwei Jiang
01 Dec 2020
Stroke and Vascular Neurology | VOL. 5

Crystal structure of native cinnamomin isoform III and its comparison with other ribosome inactivating proteins
Arezki Azzi ... Sheng‐Xiang Lin
Proteins: Structure, Function, and Bioinformatics | VOL. 74
Arezki Azzi, et. al.Arezki Azzi ... Sheng‐Xiang Lin
02 Sep 2008
Proteins: Structure, Function, and Bioinformatics | VOL. 74

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Partial Order Optimum Likelihood (POOL): Maximum Likelihood Prediction of Protein Active Site Residues Using 3D Structure and Sequence Properties

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS Computational Biology