Abstract

Multi-label proteins play a significant role in life processes such as cell growth, development, and reproduction. Exploring protein subcellular localization (SCL) is a direct way to better understand the function of multi-label proteins in cells. This paper firstly presents a new prediction model named MpsLDA-ProSVM which predicts the SCL of multi-label proteins. Firstly, we utilize four coding algorithms including pseudo position-specific scoring matrix (PsePSSM), gene ontology (GO), conjoint triad (CT) and pseudo amino acid composition (PseAAC) to draw the feature information from protein sequences. Then, for the first time, we use a weighted multi-label linear discriminant analysis framework based on entropy weight form (wMLDAe) to refine and purify features. Finally, we input the optimal feature subset into the multi-label learning with label-specific features (LIFT) and multi-label k-nearest neighbor (ML-KNN) algorithms to obtain a synthetic ranking of relevant labels, and then use Prediction and Relevance Ordering based SVM (ProSVM) classifier to predict the SCLs. Tested by leave-one-out cross-validation (LOOCV), the overall actual accuracy on virus, plant, Gram-positive bacteria and Gram-negative bacteria datasets are 98.06%, 98.97%, 99.81% and 98.49%, which are 0.56%–9.16%, 1.07%–30.87%, 0.21%–6.91% and 3.99%–8.59% higher than other advanced methods respectively. By comparison, the model MpsLDA-ProSVM can effectively predict the specific location of multi-label proteins in cells.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call