Abstract

BackgroundIntegral membrane proteins constitute about 20–30% of all proteins in the fully sequenced genomes. They come in two structural classes, the α-helical and the β-barrel membrane proteins, demonstrating different physicochemical characteristics, structure and localization. While transmembrane segment prediction for the α-helical integral membrane proteins appears to be an easy task nowadays, the same is much more difficult for the β-barrel membrane proteins. We developed a method, based on a Hidden Markov Model, capable of predicting the transmembrane β-strands of the outer membrane proteins of gram-negative bacteria, and discriminating those from water-soluble proteins in large datasets. The model is trained in a discriminative manner, aiming at maximizing the probability of correct predictions rather than the likelihood of the sequences.ResultsThe training has been performed on a non-redundant database of 14 outer membrane proteins with structures known at atomic resolution; it has been tested with a jacknife procedure, yielding a per residue accuracy of 84.2% and a correlation coefficient of 0.72, whereas for the self-consistency test the per residue accuracy was 88.1% and the correlation coefficient 0.824. The total number of correctly predicted topologies is 10 out of 14 in the self-consistency test, and 9 out of 14 in the jacknife. Furthermore, the model is capable of discriminating outer membrane from water-soluble proteins in large-scale applications, with a success rate of 88.8% and 89.2% for the correct classification of outer membrane and water-soluble proteins respectively, the highest rates obtained in the literature. That test has been performed independently on a set of known outer membrane proteins with low sequence identity with each other and also with the proteins of the training set.ConclusionBased on the above, we developed a strategy, that enabled us to screen the entire proteome of E. coli for outer membrane proteins. The results were satisfactory, thus the method presented here appears to be suitable for screening entire proteomes for the discovery of novel outer membrane proteins. A web interface available for non-commercial users is located at: , and it is the only freely available HMM-based predictor for β-barrel outer membrane protein topology.

Highlights

  • Integral membrane proteins constitute about 20–30% of all proteins in the fully sequenced genomes

  • Two (2) additional strands were predicted correctly but slightly misplaced from their observed positions. These misplaced strands, which belong to the proteins with Protein Data Bank (PDB) codes 1PRN and 2POR, were the only strands that have been falsely predicted

  • The fact that we report 236 predicted outer membrane proteins in E. coli proteome, compared to 118 in [3] and 200 in [4], reflects the fact that we chose to retain the threshold obtained from cross-validation

Read more

Summary

Introduction

Integral membrane proteins constitute about 20–30% of all proteins in the fully sequenced genomes. A variety of algorithms and computational techniques have been proposed for the prediction of the transmembrane segments of α-helical membrane proteins, with high accuracy and precision The members of the latter class (β-barrel membrane proteins) are located in the outer membrane of gram-negative bacteria, and presumably in the outer membrane of chloroplasts and mitochondria. During the last few years, more β-barrel proteins were found in the bacterial outer membrane, and a number of structures have been solved in atomic resolution [2] These proteins perform a wide variety of functions such as active ion transport, passive nutrient uptake, membrane anchoring, adhesion, and catalytic activity. Considering the important biological functions in which outer membrane proteins are involved in, it is not a surprise that those proteins attract an increased medical interest This is confirmed by the continuously increasing number of completely sequenced genomes of gram-negative bacteria deposited in the public databases. For the reasons mentioned above, there is clearly a need to develop computational tools for predicting the membrane spanning strands of those proteins, and discriminating them from watersoluble proteins when searching entire genomes

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call