Protein Subnuclear Localization Based on Radius-SMOTE and Kernel Linear Discriminant Analysis Combined with Random Forest

Liwen Wu,Feng Wu,Qian Jiang,Shanshan Huang,Shaowen Yao,Xin Jin

doi:10.3390/electronics9101566

Liwen Wu, Feng Wu + Show 4 more

Open Access

https://doi.org/10.3390/electronics9101566

Copy DOI

Journal: Electronics	Publication Date: Sep 24, 2020
Citations: 3	License type: CC BY 4.0

Affiliation: Yunnan University

Abstract

Protein subnuclear localization plays an important role in proteomics, and can help researchers to understand the biologic functions of nucleus. To date, most protein datasets used by studies are unbalanced, which reduces the prediction accuracy of protein subnuclear localization—especially for the minority classes. In this work, a novel method is therefore proposed to predict the protein subnuclear localization of unbalanced datasets. First, the position-specific score matrix is used to extract the feature vectors of two benchmark datasets and then the useful features are selected by kernel linear discriminant analysis. Second, the Radius-SMOTE is used to expand the samples of minority classes to deal with the problem of imbalance in datasets. Finally, the optimal feature vectors of the expanded datasets are classified by random forest. In order to evaluate the performance of the proposed method, four index evolutions are calculated by Jackknife test. The results indicate that the proposed method can achieve better effect compared with other conventional methods, and it can also improve the accuracy for both majority and minority classes effectively.

Highlights

A biologic cell is a highly ordered whole that can be divided into different organelles according to spatial distribution and function, such as cytoplasm, nucleus, etc
This study proposes an effective protein subnuclear localization method, with the aim of overcoming the imbalance of protein datasets and improving the prediction accuracy of protein subnuclear localization
The dimensions of feature vector are reduced by kernel linear discriminant analysis (KLDA), which can reduce the redundant information of protein dataset

Summary

Introduction

A biologic cell is a highly ordered whole that can be divided into different organelles according to spatial distribution and function, such as cytoplasm, nucleus, etc. The proteins in cells strongly correlate with life activities because proteins are able to perform biologic functions only when the proteins are transported to the correct nucleus or in a cell [1,2]. With the development of life sciences, traditional experiments such as cell fractionation, electron microscopy, cannot meet the challenge of protein subnuclear localization due to the rapid growth of protein samples in dataset [4]. To better solve this problem, computational intelligence can be used for the protein subnuclear localization [5]. The critical issues of protein subnuclear localization using computational intelligence generally include two aspects: extract the useful features of protein sequences; select appropriate classification algorithm and evaluate the results [6]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Protein Subnuclear Localization Based on Radius-SMOTE and Kernel Linear Discriminant Analysis Combined with Random Forest

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Protein subnuclear localization based on a new effective representation and intelligent kernel linear discriminant analysis by dichotomous greedy genetic algorithm.
Shunfang Wang ... Yaoting Yue
PloS one | VOL. 13
Shunfang Wang, et. al.Shunfang Wang ... Yaoting Yue
12 Apr 2018
PloS one | VOL. 13

A Classification Model for Class Imbalance Problem in Protein Subnuclear Localization
Liwen Wu ... Yi Xiang
-
Liwen Wu, et. al.Liwen Wu ... Yi Xiang
01 Oct 2018
01 Oct 2018

ForesTexter: An efficient random forest algorithm for imbalanced text categorization
Qingyao Wu ... Shen-Shyang Ho
Knowledge-Based Systems | VOL. 67
Qingyao Wu, et. al.Qingyao Wu ... Shen-Shyang Ho
19 Jun 2014
Knowledge-Based Systems | VOL. 67

An Effective Feature Fusion Method for Protein Subnuclear Localization
Liwen Wu ... Shaowen Yao
-
Liwen Wu, et. al.Liwen Wu ... Shaowen Yao
01 Oct 2018
01 Oct 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Protein Subnuclear Localization Based on Radius-SMOTE and Kernel Linear Discriminant Analysis Combined with Random Forest

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics