Computational Method for Identifying Malonylation Sites by Using Random Forest Algorithm.

Shaopeng Wang,Yudong Cai,Yu-Hang Zhang,Xijun Sun,Jiarui Li,Tao Huang

doi:10.2174/1386207322666181227144318

Abstract

As a newly uncovered post-translational modification on the ε-amino group of lysine residue, protein malonylation was found to be involved in metabolic pathways and certain diseases. Apart from experimental approaches, several computational methods based on machine learning algorithms were recently proposed to predict malonylation sites. However, previous methods failed to address imbalanced data sizes between positive and negative samples. In this study, we identified the significant features of malonylation sites in a novel computational method which applied machine learning algorithms and balanced data sizes by applying synthetic minority over-sampling technique. Four types of features, namely, amino acid (AA) composition, position-specific scoring matrix (PSSM), AA factor, and disorder were used to encode residues in protein segments. Then, a two-step feature selection procedure including maximum relevance minimum redundancy and incremental feature selection, together with random forest algorithm, was performed on the constructed hybrid feature vector. An optimal classifier was built from the optimal feature subset, which featured an F1-measure of 0.356. Feature analysis was performed on several selected important features. Results showed that certain types of PSSM and disorder features may be closely associated with malonylation of lysine residues. Our study contributes to the development of computational approaches for predicting malonyllysine and provides insights into molecular mechanism of malonylation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Computational Method for Identifying Malonylation Sites by Using Random Forest Algorithm.

Abstract

Talk to us

Similar Papers

More From: Combinatorial chemistry & high throughput screening

Lead the way for us

Journal: Combinatorial chemistry & high throughput screening	Publication Date: May 19, 2020
Citations: 1

Similar Papers

Prediction of active sites of enzymes by maximum relevance minimum redundancy (mRMR) feature selection
Yu-Fei Gao ... Bi-Qing Li
Mol. BioSyst. | VOL. 9
Yu-Fei Gao, et. al.Yu-Fei Gao ... Bi-Qing Li
01 Jan 2013
Mol. BioSyst. | VOL. 9

Prediction of Linear B-Cell Epitopes with mRMR Feature Selection and Analysis
Bi-Qing Li ... Lu-Lu Zheng
Current Bioinformatics | VOL. 11
Bi-Qing Li, et. al.Bi-Qing Li ... Lu-Lu Zheng
08 Mar 2016
Current Bioinformatics | VOL. 11

A method to distinguish between lysine acetylation and lysine ubiquitination with feature selection and analysis
You Zhou ... Xiang-Yin Kong
Journal of Biomolecular Structure and Dynamics | VOL. 33
You Zhou, et. al.You Zhou ... Xiang-Yin Kong
23 Jan 2015
Journal of Biomolecular Structure and Dynamics | VOL. 33

PREAL: prediction of allergenic protein by maximum Relevance Minimum Redundancy (mRMR) feature selection
Jing Wang ... Jing Li
BMC Systems Biology | VOL. 7
Jing Wang, et. al.Jing Wang ... Jing Li
01 Dec 2013
BMC Systems Biology | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Computational Method for Identifying Malonylation Sites by Using Random Forest Algorithm.

Abstract

Talk to us

Similar Papers

More From: Combinatorial chemistry & high throughput screening