Abstract

Excessive oxidative stress responses can threaten our health, and thus it is essential to produce antioxidant proteins to regulate the body’s oxidative responses. The low number of antioxidant proteins makes it difficult to extract their representative features. Our experimental method did not use structural information but instead studied antioxidant proteins from a sequenced perspective while focusing on the impact of data imbalance on sensitivity, thus greatly improving the model’s sensitivity for antioxidant protein recognition. We developed a method based on the Composition of k-spaced Amino Acid Pairs (CKSAAP) and the Conjoint Triad (CT) features derived from the amino acid composition and protein-protein interactions. SMOTE and the Max-Relevance-Max-Distance algorithm (MRMD) were utilized to unbalance the training data and select the optimal feature subset, respectively. The test set used 10-fold crossing validation and a random forest algorithm for classification according to the selected feature subset. The sensitivity was 0.792, the specificity was 0.808, and the average accuracy was 0.8.

Highlights

  • Reactive oxygen species (ROS) are products of metabolic processes (Birben et al, 2012) and include singlet oxygen, hydrogen peroxide, nitric oxide, superoxide anion radicals, and hydroxyl radicals

  • Our results were obtained after processing using the Smote method, dimensionality reduction using Max-Relevance-Max-Distance algorithm (MRMD), and selected features using random forest classifiers applied to the test set

  • We proposed a method with Composition of k-spaced Amino Acid Pairs (CKSAAP) and Conjoint Triad (CT) features to identify antioxidant proteins

Read more

Summary

INTRODUCTION

Reactive oxygen species (ROS) are products of metabolic processes (Birben et al, 2012) and include singlet oxygen, hydrogen peroxide, nitric oxide, superoxide anion radicals, and hydroxyl radicals. In Feng et al (2013) proposed an idea using Naive Bayes, based on sequence information, and after 3 years, they changed the method of data processing and proposed a model called AodPred (Feng et al, 2016). It was based on a support vector machine with 3-spaced residue pairs and its accuracy was significantly better than the former model. In Meng’s experiment, the sensitivity and specificity of the test set results were 0.68 and 0.985, which meant that the sorted features were more conducive to the selection of non-antioxidant proteins These problems existed in Xu’s research, even if she did use an unbalanced treatment.

MATERIALS AND METHODS
RESULTS
DISCUSSION
Findings
DATA AVAILABILITY STATEMENT
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call