Human Protein Subcellular Localization with Integrated Source and Multi-label Ensemble Classifier.

Xiaotong Guo,Chunyu Wang,Fulin Liu,Ying Ju,Zhen Wang

doi:10.1038/srep28087

Abstract

Predicting protein subcellular location is necessary for understanding cell function. Several machine learning methods have been developed for computational prediction of primary protein sequences because wet experiments are costly and time consuming. However, two problems still exist in state-of-the-art methods. First, several proteins appear in different subcellular structures simultaneously, whereas current methods only predict one protein sequence in one subcellular structure. Second, most software tools are trained with obsolete data and the latest new databases are missed. We proposed a novel multi-label classification algorithm to solve the first problem and integrated several latest databases to improve prediction performance. Experiments proved the effectiveness of the proposed method. The present study would facilitate research on cellular proteomics.

Highlights

Predicting protein subcellular location is necessary for understanding cell function
The typical protein subcellular location system based on machine learning methods includes the following four basic steps: (1) establishment of protein data set, (2) protein sequence feature extraction, (3) design of multi-label classification algorithm, and (4) construction of Web server[6]
We found that advanced ensemble multi-label learning techniques would further improve the performance

Summary

Introduction

Predicting protein subcellular location is necessary for understanding cell function. Several machine learning methods have been developed for computational prediction of primary protein sequences because wet experiments are costly and time consuming. We proposed a novel multi-label classification algorithm to solve the first problem and integrated several latest databases to improve prediction performance. Using conventional biochemical research methods, such as cell separation method, electronic microscopy, and fluorescence microscopy, to predict protein subcellular localization is expensive, time consuming, and laborious[4]. The typical protein subcellular location system based on machine learning methods includes the following four basic steps: (1) establishment of protein data set, (2) protein sequence feature extraction, (3) design of multi-label classification algorithm, and (4) construction of Web server[6]

Objectives

Methods

Results

Conclusion