PhosBoost: Improved phosphorylation prediction recall using gradient boosting and protein language models.

Elly Poretsky,Carson M Andorf,Taner Z Sen

doi:10.1002/pld3.554

Elly Poretsky, Carson M Andorf + Show 1 more

Open Access

https://doi.org/10.1002/pld3.554

Copy DOI

Abstract

Protein phosphorylation is a dynamic and reversible post-translational modification that regulates a variety of essential biological processes. The regulatory role of phosphorylation in cellular signaling pathways, protein-protein interactions, and enzymatic activities has motivated extensive research efforts to understand its functional implications. Experimental protein phosphorylation data in plants remains limited to a few species, necessitating a scalable and accurate prediction method. Here, we present PhosBoost, a machine-learning approach that leverages protein language models and gradient-boosting trees to predict protein phosphorylation from experimentally derived data. Trained on data obtained from a comprehensive plant phosphorylation database, qPTMplants, we compared the performance of PhosBoost to existing protein phosphorylation prediction methods, PhosphoLingo and DeepPhos. For serine and threonine prediction, PhosBoost achieved higher recall than PhosphoLingo and DeepPhos (.78, .56, and .14, respectively) while maintaining a competitive area under the precision-recall curve (.54, .56, and .42, respectively). PhosphoLingo and DeepPhos failed to predict any tyrosine phosphorylation sites, while PhosBoost achieved a recall score of .6. Despite the precision-recall tradeoff, PhosBoost offers improved performance when recall is prioritized while consistently providing more confident probability scores. A sequence-based pairwise alignment step improved prediction results for all classifiers by effectively increasing the number of inferred positive phosphosites. We provide evidence to show that PhosBoost models are transferable across species and scalable for genome-wide protein phosphorylation predictions. PhosBoost is freely and publicly available on GitHub.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Plant Direct	Publication Date: Dec 1, 2023
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

PhosBoost: Improved phosphorylation prediction recall using gradient boosting and protein language models.

Abstract

Talk to us

Similar Papers

More From: Plant Direct

Lead the way for us

Similar Papers

Unmasking the sky: high-resolution PM2.5 prediction in Texas using machine learning techniques.
Elena Craft ... Yue Sun
Journal of exposure science & environmental epidemiology | VOL. 34
Elena Craft, et. al.Elena Craft ... Yue Sun
01 Apr 2024
Journal of exposure science & environmental epidemiology | VOL. 34

Modeling Diameter Distributions of Loblolly Pine Plantations in Western Gulf Coastal Plain
Xiongwei Lou ... Jason Grogan
Journal of Forestry | VOL. 119
Xiongwei Lou, et. al.Xiongwei Lou ... Jason Grogan
23 Jan 2021
Journal of Forestry | VOL. 119

Predicting stand attributes of loblolly pine in West Gulf Coastal Plain using gradient boosting and random forests
X.W Lou ... H.L Gao
Canadian Journal of Forest Research | VOL. 51
X.W Lou, et. al.X.W Lou ... H.L Gao
17 Nov 2020
Canadian Journal of Forest Research | VOL. 51

Prediction of shear wall residential beam height based on machine learning
Dejiang Wang ... Lijun Chen
International Journal of Advanced Science and Computer Applications | VOL. 4
Dejiang Wang, et. al.Dejiang Wang ... Lijun Chen
21 May 2024
International Journal of Advanced Science and Computer Applications | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PhosBoost: Improved phosphorylation prediction recall using gradient boosting and protein language models.

Abstract

Talk to us

Similar Papers

More From: Plant Direct