Abstract

Hidden Markov models (HMMs) have been successfully applied to the tasks of transmembrane protein topology prediction and signal peptide prediction. In this paper we expand upon this work by making use of the more powerful class of dynamic Bayesian networks (DBNs). Our model, Philius, is inspired by a previously published HMM, Phobius, and combines a signal peptide submodel with a transmembrane submodel. We introduce a two-stage DBN decoder that combines the power of posterior decoding with the grammar constraints of Viterbi-style decoding. Philius also provides protein type, segment, and topology confidence metrics to aid in the interpretation of the predictions. We report a relative improvement of 13% over Phobius in full-topology prediction accuracy on transmembrane proteins, and a sensitivity and specificity of 0.96 in detecting signal peptides. We also show that our confidence metrics correlate well with the observed precision. In addition, we have made predictions on all 6.3 million proteins in the Yeast Resource Center (YRC) database. This large-scale study provides an overall picture of the relative numbers of proteins that include a signal-peptide and/or one or more transmembrane segments as well as a valuable resource for the scientific community. All DBNs are implemented using the Graphical Models Toolkit. Source code for the models described here is available at http://noble.gs.washington.edu/proj/philius. A Philius Web server is available at http://www.yeastrc.org/philius, and the predictions on the YRC database are available at http://www.yeastrc.org/pdr.

Highlights

  • The structure of a protein determines its function

  • Computational methods for predicting the basic topology of a transmembrane protein are of great interest, and these methods must be able to distinguish between mature, membrane-spanning proteins and proteins that, when first synthesized, contain an N-terminal membrane-spanning signal peptide

  • A Philius Web server is available to the public as well as precomputed predictions for over six million proteins in the Yeast Resource Center database

Read more

Summary

Introduction

The structure of a protein determines its function. Knowledge of the structure can be used to guide the design of drugs, to improve the interpretation of other information such as the locations of mutations, and to identify remote protein homologs. The training set consists of pairs of sequences of the form (o,s) where o = o1,...,on is the sequence of amino acids for a protein of known topology, and s = s1,...,sn is the corresponding sequence of labels. A learned model with parameters H takes as input a single amino acid test sequence o and seeks to predict the ‘best’ corresponding label sequence s* (with no unknowns). We solve this problem using a DBN, which we call Philius. Before describing the details of our model, we first review HMMs and explain how they are a simple form of DBN. A recently published primer [13] provides an introduction to probabilistic inference using Bayesian networks for a variety of applications in computational biology

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.