Predicting protein subcellular localization is a challenging problem, particularly when query proteins have multi-label features meaning that they may simultaneously exist at, or move between, two or more different subcellular location sites. Most of the existing methods can only be used to deal with the single-label proteins. Actually, multi-label proteins should not be ignored because they usually bear some special function worthy of in-depth studies. By introducing the "multi-label learning" approach, a new predictor, called iLoc-Animal, has been developed that can be used to deal with the systems containing both single- and multi-label animal (metazoan except human) proteins. Meanwhile, to measure the prediction quality of a multi-label system in a rigorous way, five indices were introduced; they are "Absolute-True", "Absolute-False" (or Hamming-Loss"), "Accuracy", "Precision", and "Recall". As a demonstration, the jackknife cross-validation was performed with iLoc-Animal on a benchmark dataset of animal proteins classified into the following 20 location sites: (1) acrosome, (2) cell membrane, (3) centriole, (4) centrosome, (5) cell cortex, (6) cytoplasm, (7) cytoskeleton, (8) endoplasmic reticulum, (9) endosome, (10) extracellular, (11) Golgi apparatus, (12) lysosome, (13) mitochondrion, (14) melanosome, (15) microsome, (16) nucleus, (17) peroxisome, (18) plasma membrane, (19) spindle, and (20) synapse, where many proteins belong to two or more locations. For such a complicated system, the outcomes achieved by iLoc-Animal for all the aforementioned five indices were quite encouraging, indicating that the predictor may become a useful tool in this area. It has not escaped our notice that the multi-label approach and the rigorous measurement metrics can also be used to investigate many other multi-label problems in molecular biology. As a user-friendly web-server, iLoc-Animal is freely accessible to the public at the web-site .
Read full abstract