PreCar_Deep：A deep learning framework for prediction of protein carbonylation sites based on Borderline-SMOTE strategy

Lili Song,Yaokui Xu,Minghui Wang,Yue Leng

doi:10.1016/j.chemolab.2021.104428

Abstract

Carbonylation is an irreversible post-translational modification of proteins and regulates various cellular physiological processes. Due to the limitations of experimental methods, it is necessary to predict carbonylation sites by computational methods. In this paper, a new prediction model of carbonylation, Precar_Deep, is proposed. First, six feature extraction methods are used to obtain the original feature space from the protein sequences. Then, the Group LASSO method is used to remove redundant information and the oversampling Borderline-SMOTE method is employed to balance the data to obtain a new feature space. Finally, the processed data is input into the deep learning framework constructed in this paper to predict the carbonylation sites, and the performance of the model is evaluated by using 10-fold cross-validation and independent test datasets. The AUC values of the four datasets are all more than 90%. The experimental results show that PreCar_Deep is superior to other existing models and is helpful to identify protein carbonylation sites. The source code and all datasets are available at https://github.com/QUST-SHULI/PreCar_Deep/.

Full Text