Abstract
Predicting protein subcellular location is necessary for understanding cell function. Several machine learning methods have been developed for computational prediction of primary protein sequences because wet experiments are costly and time consuming. However, two problems still exist in state-of-the-art methods. First, several proteins appear in different subcellular structures simultaneously, whereas current methods only predict one protein sequence in one subcellular structure. Second, most software tools are trained with obsolete data and the latest new databases are missed. We proposed a novel multi-label classification algorithm to solve the first problem and integrated several latest databases to improve prediction performance. Experiments proved the effectiveness of the proposed method. The present study would facilitate research on cellular proteomics.
Highlights
Predicting protein subcellular location is necessary for understanding cell function
The typical protein subcellular location system based on machine learning methods includes the following four basic steps: (1) establishment of protein data set, (2) protein sequence feature extraction, (3) design of multi-label classification algorithm, and (4) construction of Web server[6]
We found that advanced ensemble multi-label learning techniques would further improve the performance
Summary
Predicting protein subcellular location is necessary for understanding cell function. Several machine learning methods have been developed for computational prediction of primary protein sequences because wet experiments are costly and time consuming. We proposed a novel multi-label classification algorithm to solve the first problem and integrated several latest databases to improve prediction performance. Using conventional biochemical research methods, such as cell separation method, electronic microscopy, and fluorescence microscopy, to predict protein subcellular localization is expensive, time consuming, and laborious[4]. The typical protein subcellular location system based on machine learning methods includes the following four basic steps: (1) establishment of protein data set, (2) protein sequence feature extraction, (3) design of multi-label classification algorithm, and (4) construction of Web server[6]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have