Abstract

Canonical π-helices are short, relatively unstable secondary structure elements found in proteins. They comprise seven or more residues and are present in 15% of all known protein structures, often in functionally important regions such as ligand- and ion-binding sites. Given their similarity to α-helices, the prediction of π-helices is a challenging task and none of the currently available secondary structure prediction methods tackle it. Here, we present PiPred, a neural network-based tool for predicting π-helices in protein sequences. By performing a rigorous benchmark we show that PiPred can detect π-helices with a per-residue precision of 48% and sensitivity of 46%. Interestingly, some of the α-helices mispredicted by PiPred as π-helices exhibit a geometry characteristic of π-helices. Also, despite being trained only with canonical π-helices, PiPred can identify 6-residue-long α/π-bulges. These observations suggest an even higher effective precision of the method and demonstrate that π-helices, α/π-bulges, and other helical deformations may impose similar constraints on sequences. PiPred is freely accessible at: https://toolkit.tuebingen.mpg.de/#/tools/quick2d. A standalone version is available for download at: https://github.com/labstructbioinf/PiPred, where we also provide the CB6133, CB513, CASP10, and CASP11 datasets, commonly used for training and validation of secondary structure prediction methods, with correctly annotated π-helices.

Highlights

  • Helices, dominant protein secondary structure elements, are defined by the recurring pattern of the hydrogen bonds between the amide hydrogen (NH) and the carbonyl oxygen (CO) atoms

  • To assess the functional role of π-helices, we surveyed 2,555 representative π-helices present in protein structures co-crystallized with ligands and found that 24% of them interact with at least one ligand, most frequently with protoporphyrin IX and its derivatives, nucleoside derivatives (e.g. NAD, NADP, FAD), and various ions

  • To systematically investigate the association between the presence of π-helices and biological functions, we performed Gene Ontology (GO) enrichment analysis, with a focus on identifying GO terms overrepresented in proteins containing π-helices

Read more

Summary

Introduction

Dominant protein secondary structure elements, are defined by the recurring pattern of the hydrogen bonds between the amide hydrogen (NH) and the carbonyl oxygen (CO) atoms. Unlike α-helices, π-helices, a less frequent type of helices, contain hydrogen bonds between residues in positions i and i + 5 (Fig. 1). Canonical π-helices are characterized by the presence of at least two π-type (i → i + 5) hydrogen bonds and the minimal length of a π-helix is seven residues[1]. The annotation of π-helices in protein structures have been developed[1,10,11], providing the possibility of identifying π-helices that are missed by the general-purpose methods. E. the accuracy of these predictors for π-helix class (“I”) is zero This can be attributed to the properties of the datasets commonly used in the secondary structure prediction problems, like CB613319,22 or CB51319,23, which contain only a small number of π-helices due to inaccuracies in DSSP9. The only method that is capable of predicting π-helices is limited to those occurring in transmembrane proteins[24]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call