Abstract

Accurate identification of crotonylation on human non-histone sites plays a crucial role in human biological research. Compared with traditional experimental methods that are labor-intensive and time-consuming, computational methods are more and more popular in recent years. Most of previous research methods are only for the prediction of crotonylation sites of histones, and there is a lack of targeted prediction tools for the identification of crotonylation sites of human non-histone proteins. In this study, we propose a stacking ensemble-based bi-level predictor for human non-histone crotonylation combining with iterative feature representation strategy, named SEBP_HNHC. To take full advantage of all the training information and solve the extreme imbalance between positive samples and negative samples, we propose a two-layer training data structure including preliminary training dataset and balanced training dataset. Preliminary training dataset is firstly divided into twelve subset using a special data division method. Then, multi-view feature encoding schemes are integrated to comprehensively represent the attribution of samples in each training subset. Next, Gini index combined with sequential forward search is employed for feature optimization for each preliminary training subset. Then, 84 preliminary models are built based on seven kinds of classifiers for twelve preliminary training subset that are now more commonly used and efficient. And then, a balanced training dataset is utilized to synthesize the capabilities of 84 preliminary models by generating 84 probability values as 84-D features for stacking ensemble. Finally, deep ensemble features are mined through iterative feature representation strategy. The experiment results show that the SEBP_HNHC method achieves robust performance in crotonylation sites prediction. Therefore, SEBP_HNHC is an effective tool for identification of crotonylation on lysine sites in human non-histone proteins.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call