MpsLDA-ProSVM: Predicting multi-label protein subcellular localization by wMLDAe dimensionality reduction and ProSVM classifier

Qi Zhang,Shan Li,Qingmei Zhang,Yandan Zhang,Yu Han,Ruixin Chen,Bin Yu

doi:10.1016/j.chemolab.2020.104216

Qi Zhang, Shan Li + Show 5 more

Open Access

https://doi.org/10.1016/j.chemolab.2020.104216

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Multi-label proteins play a significant role in life processes such as cell growth, development, and reproduction. Exploring protein subcellular localization (SCL) is a direct way to better understand the function of multi-label proteins in cells. This paper firstly presents a new prediction model named MpsLDA-ProSVM which predicts the SCL of multi-label proteins. Firstly, we utilize four coding algorithms including pseudo position-specific scoring matrix (PsePSSM), gene ontology (GO), conjoint triad (CT) and pseudo amino acid composition (PseAAC) to draw the feature information from protein sequences. Then, for the first time, we use a weighted multi-label linear discriminant analysis framework based on entropy weight form (wMLDAe) to refine and purify features. Finally, we input the optimal feature subset into the multi-label learning with label-specific features (LIFT) and multi-label k-nearest neighbor (ML-KNN) algorithms to obtain a synthetic ranking of relevant labels, and then use Prediction and Relevance Ordering based SVM (ProSVM) classifier to predict the SCLs. Tested by leave-one-out cross-validation (LOOCV), the overall actual accuracy on virus, plant, Gram-positive bacteria and Gram-negative bacteria datasets are 98.06%, 98.97%, 99.81% and 98.49%, which are 0.56%–9.16%, 1.07%–30.87%, 0.21%–6.91% and 3.99%–8.59% higher than other advanced methods respectively. By comparison, the model MpsLDA-ProSVM can effectively predict the specific location of multi-label proteins in cells.

Full Text