Abstract

Predicting protein subcellular location is necessary for understanding cell function. Several machine learning methods have been developed for computational prediction of primary protein sequences because wet experiments are costly and time consuming. However, two problems still exist in state-of-the-art methods. First, several proteins appear in different subcellular structures simultaneously, whereas current methods only predict one protein sequence in one subcellular structure. Second, most software tools are trained with obsolete data and the latest new databases are missed. We proposed a novel multi-label classification algorithm to solve the first problem and integrated several latest databases to improve prediction performance. Experiments proved the effectiveness of the proposed method. The present study would facilitate research on cellular proteomics.

Highlights

  • Predicting protein subcellular location is necessary for understanding cell function

  • The typical protein subcellular location system based on machine learning methods includes the following four basic steps: (1) establishment of protein data set, (2) protein sequence feature extraction, (3) design of multi-label classification algorithm, and (4) construction of Web server[6]

  • We found that advanced ensemble multi-label learning techniques would further improve the performance

Read more

Summary

Introduction

Predicting protein subcellular location is necessary for understanding cell function. Several machine learning methods have been developed for computational prediction of primary protein sequences because wet experiments are costly and time consuming. We proposed a novel multi-label classification algorithm to solve the first problem and integrated several latest databases to improve prediction performance. Using conventional biochemical research methods, such as cell separation method, electronic microscopy, and fluorescence microscopy, to predict protein subcellular localization is expensive, time consuming, and laborious[4]. The typical protein subcellular location system based on machine learning methods includes the following four basic steps: (1) establishment of protein data set, (2) protein sequence feature extraction, (3) design of multi-label classification algorithm, and (4) construction of Web server[6]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call