Abstract

We address the problem of assigning biological function to solved protein structures. Computational tools play a critical role in identifying potential active sites and informing screening decisions for further lab analysis. A critical parameter in the practical application of computational methods is the precision, or positive predictive value. Precision measures the level of confidence the user should have in a particular computed functional assignment. Low precision annotations lead to futile laboratory investigations and waste scarce research resources. In this paper we describe an advanced version of the protein function annotation system FEATURE, which achieved 99% precision and average recall of 95% across 20 representative functional sites. The system uses a Support Vector Machine classifier operating on the microenvironment of physicochemical features around an amino acid. We also compared performance of our method with state-of-the-art sequence-level annotator Pfam in terms of precision, recall and localization. To our knowledge, no other functional site annotator has been rigorously evaluated against these key criteria. The software and predictive models are incorporated into the WebFEATURE service at http://feature.stanford.edu/wf4.0-beta.

Highlights

  • In the past decade, the amount of three-dimensional structural information for biological macromolecules has increased greatly, partly through technological advances as well as through the structural genomics initiatives that have prioritized the systematic determination of protein and nucleic acid structures [1] using Xray crystallography, Nuclear Magnetic Resonance, electron microscopy, and other methods

  • We considered a structure to be a positive example if PROSITE indicated that it contained the functional site being modeled

  • We found that the combination of FEATURE and Support Vector Machine classifier delivered high recall at the specified level of precision

Read more

Summary

Introduction

The amount of three-dimensional structural information for biological macromolecules has increased greatly, partly through technological advances as well as through the structural genomics initiatives that have prioritized the systematic determination of protein and nucleic acid structures [1] using Xray crystallography, Nuclear Magnetic Resonance, electron microscopy, and other methods. There are many solved structures with no reported biological function, and so computational methods are critical to identify active sites and understand their molecular function. Methods based on sequence analysis are very powerful in this regard, as they can recognize domains and 1D motifs associated with function. Several methods have been developed to seek functional sites using 3D information including FFFs [2], TESS [3], GASPS [4], MarkUs [5] and FEATURE [6,7]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call