Prediction Of Protein Subcellular Localization Research Articles

Background:Revealing the subcellular location of a newly discovered protein can bring insight into their function and guide research at the cellular level. The experimental methods currently used to identify the protein subcellular locations are both time-consuming and expensive. Thus, it is highly desired to develop computational methods for efficiently and effectively identifying the protein subcellular locations. Especially, the rapidly increasing number of protein sequences entering the genome databases has called for the development of automated analysis methods.Methods:In this review, we will describe the recent advances in predicting the protein subcellular locations with machine learning from the following aspects: i) Protein subcellular location benchmark dataset construction, ii) Protein feature representation and feature descriptors, iii) Common machine learning algorithms, iv) Cross-validation test methods and assessment metrics, v) Web servers.Result & Conclusion:Concomitant with a large number of protein sequences generated by highthroughput technologies, four future directions for predicting protein subcellular locations with machine learning should be paid attention. One direction is the selection of novel and effective features (e.g., statistics, physical-chemical, evolutional) from the sequences and structures of proteins. Another is the feature fusion strategy. The third is the design of a powerful predictor and the fourth one is the protein multiple location sites prediction.

Read full abstract

The prediction of protein subcellular localization is critical for inferring protein functions, gene regulations and protein-protein interactions. With the advances of high-throughput sequencing technologies and proteomic methods, the protein sequences of numerous yeasts have become publicly available, which enables us to computationally predict yeast protein subcellular localization. However, widely-used protein sequence representation techniques, such as amino acid composition and the Chou’s pseudo amino acid composition (PseAAC), are difficult in extracting adequate information about the interactions between residues and position distribution of each residue. Therefore, it is still urgent to develop novel sequence representations. In this study, we have presented two novel protein sequence representation techniques including Generalized Chaos Game Representation (GCGR) based on the frequency and distributions of the residues in the protein primary sequence, and novel statistics and information theory (NSI) reflecting local position information of the sequence. In the GCGR + NSI representation, a protein primary sequence is simply represented by a 5-dimensional feature vector, while other popular methods like PseAAC and dipeptide adopt features of more than hundreds of dimensions. In practice, the feature representation is highly efficient in predicting protein subcellular localization. Even without using machine learning-based classifiers, a simple model based on the feature vector can achieve prediction accuracies of 0.8825 and 0.7736 respectively for the CL317 and ZW225 datasets. To further evaluate the effectiveness of the proposed encoding schemes, we introduce a multi-view features-based method to combine the two above-mentioned features with other well-known features including PseAAC and dipeptide composition, and use support vector machine as the classifier to predict protein subcellular localization. This novel model achieves prediction accuracies of 0.927 and 0.871 respectively for the CL317 and ZW225 datasets, better than other existing methods in the jackknife tests. The results suggest that the GCGR and NSI features are useful complements to popular protein sequence representations in predicting yeast protein subcellular localization. Finally, we validate a few newly predicted protein subcellular localizations by evidences from some published articles in authority journals and books.

Read full abstract

Prediction Of Protein Subcellular Localization Research Articles

Related Topics

Articles published on Prediction Of Protein Subcellular Localization

Benchmarking subcellular localization and variant tolerance predictors on membrane proteins

Advances in the Prediction of Protein Subcellular Locations with Machine Learning

Prediction of protein subcellular localization based on multilayer sparse coding

Evaluation and Interpretation of Transcriptome Data Underlying Heterogeneous Chronic Obstructive Pulmonary Disease.

A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization

Prediction of Protein Subcellular Localization Based on Fusion of Multi-view Features.

Prediction of apoptosis protein subcellular localization via heterogeneous features and hierarchical extreme learning machine

Tissue-Specific Subcellular Localization Prediction Using Multi-Label Markov Random Fields.

Prediction of Apoptosis Protein Subcellular Localization with Multilayer Sparse Coding and Oversampling Approach.

A Novel Protein Subcellular Localization Method With CNN-XGBoost Model for Alzheimer's Disease.

Application of deep convolutional neural networks in classification of protein subcellular localization with microscopy images.

Bioinformatics Workflow for Gonococcal Proteomics.

Molecular factors associated with pathogenicity of Phytophthora cinnamomi: Fatores moleculares associados com a patogenicidade de Phytophthora cinnamomi

Predicting Protein Localization Sites Using an Ensemble Self-Labeled Framework

Deep Forest-based Prediction of Protein Subcellular Localization.

Prediction of LncRNA Subcellular Localization with Deep Learning from Sequence Features

Predicting apoptosis protein subcellular localization by integrating auto-cross correlation and PSSM into Chou’s PseAAC

PROSES: A Web Server for Sequence-Based Protein Encoding.

Consistent prediction of GO protein localization

BUSCA: an integrative web server to predict subcellular localization of proteins.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Prediction Of Protein Subcellular Localization Research Articles

Related Topics

Articles published on Prediction Of Protein Subcellular Localization

Benchmarking subcellular localization and variant tolerance predictors on membrane proteins

Advances in the Prediction of Protein Subcellular Locations with Machine Learning

Prediction of protein subcellular localization based on multilayer sparse coding

Evaluation and Interpretation of Transcriptome Data Underlying Heterogeneous Chronic Obstructive Pulmonary Disease.

A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization

Prediction of Protein Subcellular Localization Based on Fusion of Multi-view Features.

Prediction of apoptosis protein subcellular localization via heterogeneous features and hierarchical extreme learning machine

Tissue-Specific Subcellular Localization Prediction Using Multi-Label Markov Random Fields.

Prediction of Apoptosis Protein Subcellular Localization with Multilayer Sparse Coding and Oversampling Approach.

A Novel Protein Subcellular Localization Method With CNN-XGBoost Model for Alzheimer's Disease.

Application of deep convolutional neural networks in classification of protein subcellular localization with microscopy images.

Bioinformatics Workflow for Gonococcal Proteomics.

Molecular factors associated with pathogenicity of Phytophthora cinnamomi: Fatores moleculares associados com a patogenicidade de Phytophthora cinnamomi

Predicting Protein Localization Sites Using an Ensemble Self-Labeled Framework

Deep Forest-based Prediction of Protein Subcellular Localization.

Prediction of LncRNA Subcellular Localization with Deep Learning from Sequence Features

Predicting apoptosis protein subcellular localization by integrating auto-cross correlation and PSSM into Chou’s PseAAC

PROSES: A Web Server for Sequence-Based Protein Encoding.

Consistent prediction of GO protein localization

BUSCA: an integrative web server to predict subcellular localization of proteins.