Abstract

Post-translational modification (PTM) sites have become popular for predictor development. However, with the exception of phosphorylation and a handful of other examples, PTMs suffer from a limited number of available training examples and sparsity in protein sequences. Here, proline hydroxylation is taken as an example to compare different methods and evaluate their performance on new experimentally determined sites. As a guide for effective experimental design, predictors require both high specificity and sensitivity. However, the self-reported performance may often not be indicative of prediction quality and detection of new sites is not guaranteed. We have benchmarked seven published hydroxylation site predictors on two newly constructed independent datasets. The self-reported performance is found to widely overestimate the real accuracy measured on independent datasets. No predictor performs better than random on new examples, indicating the refined models do not sufficiently generalize to detect new sites. The number of false positives is high and precision low, in particular for non-collagen proteins whose motifs are not conserved. As hydroxylation site predictors do not generalize for new data, caution is advised when using PTM predictors in the absence of independent evaluations, in particular for highly specific sites involved in signalling.

Highlights

  • Post translational modifications (PTMs) are alterations of the primary protein structure, including both new covalent links and cleavage events

  • We evaluated post-translational modification (PTM) predictors for hydroxylation sites and found that they perform no better than random, in strong contrast to performances reported in their original publications

  • PTMs are chemical amino acid alterations providing the cell with conditional mechanisms to fine tune protein function, regulating complex biological processes such as signalling and cell cycle

Read more

Summary

Introduction

Post translational modifications (PTMs) are alterations of the primary protein structure, including both new covalent links and cleavage events. PTMs provide a way to expand the spectrum of protein functions as well as an additional layer for pathway regulation [3] They are catalyzed by enzymes that identify a specific site in the substrate protein, with a plurality of PTM motifs residing in intrinsically disordered regions in order to facilitate enzyme accessibility [4]. Computational methods can become hypothesis generators for an effective design of PTM experiments Their implementation is straightforward due to the sequence specificity and peculiar physico-chemical properties of PTM motifs. The method should be robust enough to maintain performance across a range of different datasets, as it is often not clear which experimental conditions may introduce biases On both accounts, PTM predictors may be problematic as they are rarely assessed by independent third parties. Generalizing models for PTM site recognition is difficult as the number of experimental observations is low and many new types of motifs are still poorly characterized

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call