Abstract

BackgroundProtein subcellular localization is concerned with predicting the location of a protein within a cell using computational method. The location information can indicate key functionalities of proteins. Accurate predictions of subcellular localizations of protein can aid the prediction of protein function and genome annotation, as well as the identification of drug targets. Computational methods based on machine learning, such as support vector machine approaches, have already been widely used in the prediction of protein subcellular localization. However, a major drawback of these machine learning-based approaches is that a large amount of data should be labeled in order to let the prediction system learn a classifier of good generalization ability. However, in real world cases, it is laborious, expensive and time-consuming to experimentally determine the subcellular localization of a protein and prepare instances of labeled data.ResultsIn this paper, we present an approach based on a new learning framework, semi-supervised learning, which can use much fewer labeled instances to construct a high quality prediction model. We construct an initial classifier using a small set of labeled examples first, and then use unlabeled instances to refine the classifier for future predictions.ConclusionExperimental results show that our methods can effectively reduce the workload for labeling data using the unlabeled data. Our method is shown to enhance the state-of-the-art prediction results of SVM classifiers by more than 10%.

Highlights

  • Protein subcellular localization is concerned with predicting the location of a protein within a cell using computational method

  • Organelles with different functions are the specialized subunits in a cell. (See Figure 1.) Most organelles are closed compartments separated by lipid membranes, such as mitochondria, chloroplasts, peroxisomes, lysosomes, endoplasmic reticulum, cell nucleus and Golgi apparatus

  • We present a semi-supervised learning approach to solve protein subcellular localization problem

Read more

Summary

Introduction

Protein subcellular localization is concerned with predicting the location of a protein within a cell using computational method. (See Figure 1.) Most organelles are closed compartments separated by lipid membranes, such as mitochondria, chloroplasts, peroxisomes, lysosomes, endoplasmic reticulum, cell nucleus and Golgi apparatus. These compartments play different roles, for instance, mitochondria supply chemical energy ATP for cell survive; chloroplasts transform light energy to chemical energy using photosynthesis; peroxisomes participate metabo-. Most organelles are closed compartments separated by lipid membranes, such as mitochondria, chloroplasts, peroxisomes, lysosomes, endoplasmic reticulum, cell nucleus and Golgi apparatus. Protein subcellular localization is crucial for genome annotation, protein function prediction, and drug discovery [2]. Take prokaryotic and eukaryotic proteins as examples, for prokaryotes, many proteins that are synthesized in the cytoplasm are found noncytoplasmic locations [3], such as to a cell membrane or the extracellular environment, while most eukaryotic proteins are encoded in the nuclear and transported to the cytosol for further synthesis

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call