Abstract
The Golgi apparatus is a key organelle for protein synthesis in eukaryotic cell. Any dysfunction of Golgi-resident proteins can lead to different diseases, especially neurodegenerative and inherited diseases, such as diabetes, cancer, and cystic fibrosis, and so on. Therefore, the accurate classification of Golgi-resident proteins may contribute to drug development and further to drug therapy. This paper presents a novel Golgi-resident protein types prediction method called Golgi-XGBoost. First, the feature vectors of protein sequence are extracted by fusing pseudo-amino acid composition (PseAAC), dipeptide composition (DC), pseudo-position specific scoring matrix (PsePSSM) and encoding based on grouped weight (EBGW). Secondly, the conditional covariance minimization (CCM) is used to reduce the dimension of the feature vectors. Then, we adopt the synthetic minority over sampling technique (SMOTE) to balance the samples. Finally, the optimal feature vectors are input into the extreme gradient boosting (XGBoost) classifier to predict the type of Golgi-resident protein. The overall prediction accuracy is 92.1% on training set via jackknife test, which achieves better performance than other state-of-the-art methods. The accuracy of independent testing dataset is 86.5%. And the results show that this paper provides a new method for predicting the type of Golgi-resident protein. The source code and all datasets are available at https://github.com/QUST-AIBBDRC/Golgi-XGBoost/.
Highlights
The Golgi apparatus is an important subcellular organelle for protein synthesis within eukaryotic cell, which is composed of a pile of membrane-bounded cisternae located between the endoplasmic reticulum and the cell surface
SELECTION OF CLASSIFICATION ALGORITHMS This paper focuses on five classification algorithms: random forest (RF), K-nearest neighbor (KNN), support vector machine (SVM), naïve Bayes (NB), and extreme gradient boosting (XGBoost)
covariance minimization (CCM) is used to reduce the dimension of the feature vectors
Summary
The Golgi apparatus is an important subcellular organelle for protein synthesis within eukaryotic cell, which is composed of a pile of membrane-bounded cisternae located between the endoplasmic reticulum and the cell surface. The main function of the Golgi apparatus is to process, compare, classify and package proteins synthesized by the endoplasmic reticulum, and send them to specific parts of the cell. The Golgi apparatus has three elements, namely, cis-Golgi, medial-Golgi, and trans-Golgi [2]. The cis-Golgi completes the receiving jobs, which is closer to the endoplasmic reticulum and receives the vesicles for sorting and further processing before they are transferred to the trans-Golgi. The medial-Golgi accomplishes the embellishment of glycosylation and the synthesis of polysaccharides and lipids. The trans-Golgi is responsible for the release of tagged and processed proteins into plasma membranes or lysosomes by secretory vesicles
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.