DeepSub: Utilizing Deep Learning for Predicting the Number of Subunits in Homo-Oligomeric Protein Complexes.

Rui Deng,Ke Wu,Yang Li,Zihan Zhang,Zhenkun Shi,Xiaoping Liao,Hongwu Ma,Dehang Wang,Yuanyuan Huang,Zhitao Mao,Zhiwen Wang,Jiawei Lin

doi:10.3390/ijms25094803

Rui Deng, Ke Wu + Show 10 more

Open Access

PDF Available

https://doi.org/10.3390/ijms25094803

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

The molecular weight (MW) of an enzyme is a critical parameter in enzyme-constrained models (ecModels). It is determined by two factors: the presence of subunits and the abundance of each subunit. Although the number of subunits (NS) can potentially be obtained from UniProt, this information is not readily available for most proteins. In this study, we addressed this gap by extracting and curating subunit information from the UniProt database to establish a robust benchmark dataset. Subsequently, we propose a novel model named DeepSub, which leverages the protein language model and Bi-directional Gated Recurrent Unit (GRU), to predict NS in homo-oligomers solely based on protein sequences. DeepSub demonstrates remarkable accuracy, achieving an accuracy rate as high as 0.967, surpassing the performance of QUEEN. To validate the effectiveness of DeepSub, we performed predictions for protein homo-oligomers that have been reported in the literature but are not documented in the UniProt database. Examples include homoserine dehydrogenase from Corynebacterium glutamicum, Matrilin-4 from Mus musculus and Homo sapiens, and the Multimerins protein family from M. musculus and H. sapiens. The predicted results align closely with the reported findings in the literature, underscoring the reliability and utility of DeepSub.

Full Text