Abstract

The prediction of protein subcellular localization not only is important for the study of protein structure and function but also can facilitate the design and development of new drugs. In recent years, feature extraction methods based on protein evolution information have attracted much attention and made good progress. Based on the protein position-specific score matrix (PSSM) obtained by PSI-BLAST, PSSM-GSD method is proposed according to the data distribution characteristics. In order to reflect the protein sequence information as much as possible, AAO method, PSSM-AAO method, and PSSM-GSD method are fused together. Then, conditional entropy-based classifier chain algorithm and support vector machine are used to locate multilabel proteins. Finally, we test Gpos-mPLoc and Gneg-mPLoc datasets, considering the severe imbalance of data, and select SMOTE algorithm to expand a few sample; the experiment shows that the AAO + PSSM ∗ method in the paper achieved 83.1% and 86.8% overall accuracy, respectively. After experimental comparison of different methods, AAO + PSSM ∗ has good performance and can effectively predict protein subcellular location.

Highlights

  • Cells are the basic unit of life, and various organelles in organisms are called subcells, which are further subdivided into cells, including mitochondria, cell membrane, and nucleus

  • More and more models have been proposed to predict protein subcellular localization, and the accuracy and calculation speed have been improved continuously. erefore, protein subcellular localization prediction has become a major focus in biological information research. e prediction model of protein subcellular localization mainly consists of two parts: one is to select a reasonable method to extract protein information features to a great extent; the other is to build a classification prediction model to obtain better results

  • In order to consider as much protein sequence information as possible, based on the idea of feature fusion, this study proposes a new feature extraction algorithm for protein subcellular localization prediction

Read more

Summary

Introduction

Cells are the basic unit of life, and various organelles in organisms are called subcells, which are further subdivided into cells, including mitochondria, cell membrane, and nucleus. The traditional experimental localization prediction methods overconsume the experimental cost and time [1], so it is urgent to build an efficient and accurate computational model to predict the subcellular location of proteins. For the newly discovered unknown protein, selecting suitable models with good performance to predict its subcellular location can help us further understand the life activities of the protein in the organism. Erefore, protein subcellular localization is of certain significance to the study of protein function and structure and helps us to recognize new proteins and better understand complex biological functions. E prediction model of protein subcellular localization mainly consists of two parts: one is to select a reasonable method to extract protein information features to a great extent; the other is to build a classification prediction model to obtain better results More and more models have been proposed to predict protein subcellular localization, and the accuracy and calculation speed have been improved continuously. erefore, protein subcellular localization prediction has become a major focus in biological information research. e prediction model of protein subcellular localization mainly consists of two parts: one is to select a reasonable method to extract protein information features to a great extent; the other is to build a classification prediction model to obtain better results

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call