Abstract
The subcellular locations of proteins are closely related to their functions. In the past few decades, the application of machine learning algorithms to predict protein subcellular locations has been an important topic in proteomics. However, most studies in this field used only amino acid sequences as the data source. Only a few works focused on other protein data types. For example, three-dimensional structures, which contain far more functional protein information than sequences, remain to be explored. In this work, we extracted various handcrafted features to describe the protein structures from physical, chemical, and topological aspects, as well as the learned features obtained by deep neural networks. We then used these features to classify the protein subcellular locations. Our experimental results demonstrated that some of these structural features have a certain effect on the protein location classification, and can help improve the performance of sequence-based location predictors. Our method provides a new view for the analysis of protein spatial distribution, and is anticipated to be used in revealing the relationships between protein structures and functions.
Highlights
Given that subcellular/organelle structures in cells provide specific physiological and functional environments, the determination of the subcellular locations of proteins is believed to be an important aspect of the understanding of their functions [1,2]
Some prediction methods, such as Hum-mPLoc 3.0 [4] and SCLpred [5], constructed sequence features through a target signal search, motif analysis, or homology transfer, while some works in recent years, like DeepLoc [6] and HumDLoc [7], employed deep learning models to learn the protein features automatically
In order to test the ability of the above descriptors to distinguish subcellular protein locations, we used t-distributed stochastic neighbor embedding (t-SNE) to visualize
Summary
Given that subcellular/organelle structures in cells provide specific physiological and functional environments, the determination of the subcellular locations of proteins is believed to be an important aspect of the understanding of their functions [1,2]. The theoretical basis of the predictions is that one protein is transported into specific subcellular structure(s) according to its signal peptide, which is a short segment buried in the amino acid sequence. Some prediction methods, such as Hum-mPLoc 3.0 [4] and SCLpred [5], constructed sequence features through a target signal search, motif analysis, or homology transfer, while some works in recent years, like DeepLoc [6] and HumDLoc [7], employed deep learning models to learn the protein features automatically.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.