Abstract

The Golgi Apparatus (GA) is a major collection and dispatch station for numerous proteins destined for secretion, plasma membranes and lysosomes. The dysfunction of GA proteins can result in neurodegenerative diseases. Therefore, accurate identification of protein subGolgi localizations may assist in drug development and understanding the mechanisms of the GA involved in various cellular processes. In this paper, a new computational method is proposed for identifying cis-Golgi proteins from trans-Golgi proteins. Based on the concept of Common Spatial Patterns (CSP), a novel feature extraction technique is developed to extract evolutionary information from protein sequences. To deal with the imbalanced benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is adopted. A feature selection method called Random Forest-Recursive Feature Elimination (RF-RFE) is employed to search the optimal features from the CSP based features and g-gap dipeptide composition. Based on the optimal features, a Random Forest (RF) module is used to distinguish cis-Golgi proteins from trans-Golgi proteins. Through the jackknife cross-validation, the proposed method achieves a promising performance with a sensitivity of 0.889, a specificity of 0.880, an accuracy of 0.885, and a Matthew’s Correlation Coefficient (MCC) of 0.765, which remarkably outperforms previous methods. Moreover, when tested on a common independent dataset, our method also achieves a significantly improved performance. These results highlight the promising performance of the proposed method to identify Golgi-resident protein types. Furthermore, the CSP based feature extraction method may provide guidelines for protein function predictions.

Highlights

  • The Golgi Apparatus (GA), an important eukaryotic organelle involved in the metabolism of numerous proteins [1], is a major collection and dispatch station for numerous proteins destined for secretion, plasma membranes and lysosomes [2,3]

  • The feature set consisting of 3-gap DC, Common Spatial Patterns (CSP)-PSSM-Dipeptide Composition (PSSM-DC), CSP-Bi-gram PSSM, and CSP-Evolutionary Difference-PSSM (ED-PSSM) results in maximum discrimination between cis-Golgi proteins and trans-Golgi proteins, with the Sn of 0.876, the Sp of 0.853, the Acc of 0.864, the Matthew’s Correlation Coefficient (MCC) of 0.728, and the area under the ROC curve (AUC) of 0.912

  • We present the performance analysis on hybrid feature sets constructed by the combination of the CSP based feature extraction method and 3-gap DC

Read more

Summary

Introduction

The Golgi Apparatus (GA), an important eukaryotic organelle involved in the metabolism of numerous proteins [1], is a major collection and dispatch station for numerous proteins destined for secretion, plasma membranes and lysosomes [2,3]. The GA is comprised of three distinct membrane-bounded cisternae located between the endoplasmic reticulum and the cell surface, including cis-Golgi, media-Golgi, and trans-Golgi [6]. The cis-Golgi and trans-Golgi are thought to be specialised cisternae leading proteins in and out of the GA [7]. The cis-Golgi functions as the receiving end for the biosynthetic output from the endoplasmic reticulum [4]. The function of the trans-Golgi is to sort and ship proteins to their intended destinations [8]. The basic mechanism of the GA processing is known, how Golgi cisternae transports biosynthetic secretory cargo, and how resident Golgi proteins are localized to particular sets of cisternae, remain important and fascinating questions that await resolution [9]. To elucidate functions of the GA involved in various cellular processes, an initial but crucial step is to identify the protein composition of the subcellular compartments of the GA

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.