Abstract

A systematic introduction has been presented for the recent advances in predicting protein subcellular localization in the multi-label systems, where the constituent proteins may simultaneously occur or move between two or more location sites and hence have exceptional biological functions worthy of our special notice. All the predictors included in this review each have a user-friendly web-server, by which the majority of experimental scientists can very easily acquire their desired data without the need to go through the complicated mathematics involved.

Highlights

  • As elucidated in two recent comprehensive review papers [1, 2], to develop a really useful bioinformatics tool, one needs to observe the guidelines of the Chou’s 5-steps rule [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36] to go through the following five steps: 1) select or construct a valid benchmark dataset to train and test the predictor; 2) represent the samples with an effective formulation that can truly reflect their intrinsic correlation with the target to be predicted; 3) introduce or develop a powerful algorithm to conduct the prediction; 4) properly perform cross-validation tests to objectively evaluate the anticipated prediction accuracy; 5) establish a user-friendly web-server for the predictor that is accessible to the public

  • The protein samples in the iLoc- series [49,50,51,52,53,54,55] were formulated by incorporating the GO information and PSSM information into the general PseAAC

  • The development of protein subcellular location prediction can be separated into two stages

Read more

Summary

INTRODUCTION

As elucidated in two recent comprehensive review papers [1, 2], to develop a really useful bioinformatics tool, one needs to observe the guidelines of the Chou’s 5-steps rule [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36] to go through the following five steps: 1) select or construct a valid benchmark dataset to train and test the predictor; 2) represent the samples with an effective formulation that can truly reflect their intrinsic correlation with the target to be predicted; 3) introduce or develop a powerful algorithm to conduct the prediction; 4) properly perform cross-validation tests to objectively evaluate the anticipated prediction accuracy; 5) establish a user-friendly web-server for the predictor that is accessible to the public. This is just like the case of many machine-learning algorithms. They can be used in most the areas of statistical analysis

PREDICTING SUBCELLULAR LOCALIZATION OF PROTEINS
FOUR SERIES OF PREDICTORS
Benchmark Dataset
Sample Formulation
Operation Engine
Metrics and Cross-Validation
Cross-Validation and Jackknife Test
Web Servers
CONCLUSIONS AND PERSPECTIVE
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call