Abstract

BackgroundAlpha-helical transmembrane (TM) proteins are involved in a wide range of important biological processes such as cell signaling, transport of membrane-impermeable molecules, cell-cell communication, cell recognition and cell adhesion. Many are also prime drug targets, and it has been estimated that more than half of all drugs currently on the market target membrane proteins. However, due to the experimental difficulties involved in obtaining high quality crystals, this class of protein is severely under-represented in structural databases. In the absence of structural data, sequence-based prediction methods allow TM protein topology to be investigated.ResultsWe present a support vector machine-based (SVM) TM protein topology predictor that integrates both signal peptide and re-entrant helix prediction, benchmarked with full cross-validation on a novel data set of 131 sequences with known crystal structures. The method achieves topology prediction accuracy of 89%, while signal peptides and re-entrant helices are predicted with 93% and 44% accuracy respectively. An additional SVM trained to discriminate between globular and TM proteins detected zero false positives, with a low false negative rate of 0.4%. We present the results of applying these tools to a number of complete genomes. Source code, data sets and a web server are freely available from .ConclusionThe high accuracy of TM topology prediction which includes detection of both signal peptides and re-entrant helices, combined with the ability to effectively discriminate between TM and globular proteins, make this method ideally suited to whole genome annotation of alpha-helical transmembrane proteins.

Highlights

  • Alpha-helical transmembrane (TM) proteins are involved in a wide range of important biological processes such as cell signaling, transport of membrane-impermeable molecules, cell-cell communication, cell recognition and cell adhesion

  • The TM helix/¬TM helix support vector machine-based (SVM) performs significantly better than the re-entrant helix/¬re-entrant helix and inside loop/outside loop SVMs, and slightly better than the signal peptide/¬signal peptide and TM protein/globular protein SVMs, reflecting the relative ease with which the hydrophobic signal of a TM helix is detected compared to sequence features within the other topological regions

  • The Matthews correlation coefficient (MCC) value of 0.80 compares favourably with the equivalent value of 0.76 achieved by MEMSAT3 using a neural networks (NNs) when cross-validated against the same test set

Read more

Summary

Introduction

Alpha-helical transmembrane (TM) proteins are involved in a wide range of important biological processes such as cell signaling, transport of membrane-impermeable molecules, cell-cell communication, cell recognition and cell adhesion. Due to the experimental difficulties involved in obtaining high quality crystals, this class of protein is severely under-represented in structural databases. Alpha-helical transmembrane (TM) proteins constitute roughly 30% of a typical genome and are involved in a wide variety of important biological processes including cell signaling, transport of membrane-impermeable molecules and cell recognition. Prediction methods, based on the physicochemical principle of a sliding window of hydrophobicity combined with the 'positive-inside' rule [3], have been superseded by machine learning approaches which prevail due to their statistical formulation. These include Hidden Markov models (HMMs), neural networks (NNs) and more recently, support vector machines (SVMs). They are considered more resilient to the problem of over-training compared to other machine learning methods

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call