Abstract

Post-translational modifications (PTMs) regulate protein behavior through modulation of protein-protein interactions, enzymatic activity, and protein stability essential in the translation of genotype to phenotype in eukaryotes. Currently, less than 4% of all eukaryotic PTMs are reported to have biological function - a statistic that continues to decrease with an increasing rate of PTM detection. Previously, we developed SAPH-ire (Structural Analysis of PTM Hotspots) - a method for the prioritization of PTM function potential that has been used effectively to reveal novel PTM regulatory elements in discrete protein families (Dewhurst et al., 2015). Here, we apply SAPH-ire to the set of eukaryotic protein families containing experimental PTM and 3D structure data - capturing 1,325 protein families with 50,839 unique PTM sites organized into 31,747 modified alignment positions (MAPs), of which 2010 (∼6%) possess known biological function. Here, we show that using an artificial neural network model (SAPH-ire NN) trained to identify MAP hotspots with biological function results in prediction outcomes that far surpass the use of single hotspot features, including nearest neighbor PTM clustering methods. We find the greatest enhancement in prediction for positions with PTM counts of five or less, which represent 98% of all MAPs in the eukaryotic proteome and 90% of all MAPs found to have biological function. Analysis of the top 1092 MAP hotspots revealed 267 of truly unknown function (containing 5443 distinct PTMs). Of these, 165 hotspots could be mapped to human KEGG pathways for normal and/or disease physiology. Many high-ranking hotspots were also found to be disease-associated pathogenic sites of amino acid substitution despite the lack of observable PTM in the human protein family member. Taken together, these experiments demonstrate that the functional relevance of a PTM can be predicted very effectively by neural network models, revealing a large but testable body of potential regulatory elements that impact hundreds of different biological processes important in eukaryotic biology and human health.

Highlights

  • Since the discovery of phosphorylation in 1954 [1], posttranslational modifications (PTMs)1 have emerged as a broad class of protein feature that expand the functional proteome in eukaryotes

  • The SAPH-ire Data Set: Post-translational modifications (PTMs) and Protein Structure—As of the submission date of this manuscript, the collection of experimentally verified eukaryotic data available from dbPTM included 213,022 eukaryotic PTMs that we coalesced into 85,443 modified alignment positions (MAPs) distributed across 4813 protein families

  • 50,839 (ϳ24%) PTMs, 31,747 (ϳ37%) MAPs, and 1325 (ϳ28%) protein families can be analyzed by SAPH-ire, which requires experimental nonchimeric structures and experimental PTM data (Fig. 1D; see methods) [9]

Read more

Summary

Introduction

Since the discovery of phosphorylation in 1954 [1], posttranslational modifications (PTMs) have emerged as a broad class of protein feature that expand the functional proteome in eukaryotes. We built Structural Analysis of PTM Hotspots (SAPH-ire)—an algorithm through which multiple predictors of PTM function are integrated to produce a single, quantitative function potential (FP) score that rank orders each hotspot within or between protein families [6] (Fig. 1). We apply SAPH-ire to protein families for which PTMs and protein structure are currently available, resulting in function potential prediction for 50,839 experimental PTM sites distributed across 31,747 MAPs. Using a neural network model (SAPH-ire NN) trained to predict the identity of embedded known-function MAPs, we derived a probability score that allows rank ordering for the likelihood of function for all MAPs including those with unknown function. Using a strictly conservative probability threshold, we characterized the top-ranked 1092 MAPs corresponding to “function potential hotspots,” revealing 267 with truly unknown function - a striking fraction of which are found mutated in human disease irrespective of whether the human protein, contains an observed PTM

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call