Abstract

The 2-oxoglutarate/Fe (II)-dependent (2OG) oxygenase superfamily is mainly responsible for protein modification, nucleic acid repair and/or modification, and fatty acid metabolism and plays important roles in cancer, cardiovascular disease, and other diseases. They are likely to become new targets for the treatment of cancer and other diseases, so the accurate identification of 2OG oxygenases is of great significance. Many computational methods have been proposed to predict functional proteins to compensate for the time-consuming and expensive experimental identification. However, machine learning has not been applied to the study of 2OG oxygenases. In this study, we developed OGFE_RAAC, a prediction model to identify whether a protein is a 2OG oxygenase. To improve the performance of OGFE_RAAC, 673 amino acid reduction alphabets were used to determine the optimal feature representation scheme by recoding the protein sequence. The 10-fold cross-validation test showed that the accuracy of the model in identifying 2OG oxygenases is 91.04%. Besides, the independent dataset results also proved that the model has excellent generalization and robustness. It is expected to become an effective tool for the identification of 2OG oxygenases. With further research, we have also found that the function of 2OG oxygenases may be related to their polarity and hydrophobicity, which will help the follow-up study on the catalytic mechanism of 2OG oxygenases and the way they interact with the substrate. Based on the model we built, a user-friendly web server was established and can be friendly accessed at http://bioinfor.imu.edu.cn/ogferaac.

Highlights

  • Identification of 2-oxoglutarate/Fe (II)-dependent (2OG) Oxygenases widely distributed in animals, plants, and microorganisms

  • To obtain the optimal amino acid reduction scheme and the appropriate K value (K = 1, 2, 3), we calculated the accuracy of the 673 reduction schemes mentioned in RAACBook (Zheng et al, 2019) with the different K values

  • We found that when t = 33 (Table 2), s = 15 (t represents the t-th reduction type in RAACBook; s represents the size of the reduced amino acid cluster (RAAC)), the highest accuracy rate is 83.75% (Figure 4B)

Read more

Summary

Introduction

Identification of 2OG Oxygenases widely distributed in animals, plants, and microorganisms In animals, their catalytic range includes hydroxylation and N-demethylation proceeding via hydroxylation; in plants and microbes, they affect a wider range, including hydroxylation, ring formations, cleavage, oxidation, rearrangements, desaturations, and halogenations (Farrow and Facchini, 2014; Kawai et al, 2014). Many machine learning methods for the prediction of metal ion-binding proteins have achieved excellent results. In the prediction of human and nonhuman enzymes (Wang H. et al, 2021), ion channel-targeted conotoxins (Sun et al, 2020), plasmodium secretory protein (Zhang et al, 2020), and defensin peptides (Zuo et al, 2019), the method of reduced amino acid has shown superior performance. The results of 10-fold cross-validation and independent test set showed that OGFE_RAAC could accurately predict 2OG oxygenases

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.