Abstract

Post‐translational Modifications (PTMs), chemical or proteinaceous covalent alterations to the side chains of amino acid residues in proteins, are a rapidly expanding feature class of significant importance in cell biology. Due to a high burden of experimental proof and the lack of effective means for experimentalists to prioritize PTMs by functional significance, currently less than ~2% of all PTMs have an assigned biological function. Here, we describe a new artificial neural network model, SAPH‐ire TFx for the functional prediction of experimentally observed eukaryotic PTMs. Unlike previous functional PTM prioritization models, SAPH‐ire TFx is trained to emphasize metrics that maximally capture the range of diverse feature sets comprising the functional modified eukaryotic proteome. The model of was generated through systematic evaluation of input features, model architectures, training procedures, and interpretation metrics using a 2018 training dataset of 430,750 PTMs containing 7,480 PTMs with literature‐supported evidence of biological function. The resulting model was used to classify an expanded 2019 dataset of 512,015 PTMs (12,867 known functional) containing 102,475 PTMs unencountered in the original dataset. Model output from the 2019 extended dataset was benchmarked against pre‐existing prediction models, revealing superior performance in classification of functional and/or disease‐linked PTM sites, including drawing attention to PTMs that were previously thought inconsequential. Finally, a dynamic web interface provides customizable graphical and tabular visualization of PTM and SAPH‐ire TFx data within the context of all modifications within a protein family, exposing several metrics by which important functional PTMs can be identified for investigation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call