Abstract

Antimicrobial resistance is driving pharmaceutical companies to investigate different therapeutic approaches. One approach that has garnered growing consideration in drug development is the use of antimicrobial peptides (AMPs). Antibacterial peptides (ABPs), which occur naturally as part of the immune response, can serve as powerful, broad-spectrum antibiotics. However, conventional laboratory procedures for screening and discovering ABPs are expensive and time-consuming. Identification of ABPs can be significantly improved using computational methods. In this paper, we introduce a machine learning method for the fast and accurate prediction of ABPs. We gathered more than 6000 peptides from publicly available datasets and extracted 1209 features (peptide characteristics) from these sequences. We selected the set of optimal features by applying correlation-based and random forest feature selection techniques. Finally, we designed an ensemble gradient boosting model (GBM) to predict putative ABPs. We evaluated our model using receiver operating characteristic (ROC) curves, calculating the area under the curve (AUC) for several different models for comparison, including a recurrent neural network, a support vector machine, and iAMPpred. The AUC for the GBM was ~0.98, more than 3% better than any of the other models.

Highlights

  • Antimicrobial resistance poses a severe health threat because it compromises treatment of a wide range of infections caused by bacteria, viruses, or fungi [1,2]

  • We evaluated our model using receiver operating characteristic (ROC) curves, calculating the area under the curve (AUC) for several different models for comparison, including a recurrent neural network (RNN), a support vector machine (SVM), and iAMPpred

  • An ROC curve gives the relationship between the false positive rate (FPR) and the true positive rate (TPR) at several threshold settings

Read more

Summary

Introduction

Antimicrobial resistance poses a severe health threat because it compromises treatment of a wide range of infections caused by bacteria, viruses, or fungi [1,2]. A promising computational approach is the use of a supervised machine learning model that uses the physicochemical properties of ABPs. For example, in [6], the authors used pseudo amino acid. Proceedings 2020, 66, 6 composition and fuzzy k-nearest neighbors to develop a tool called iAMP-2L to predict AMPs. Later, in [4], the authors developed an online prediction server called iAMPpred, which increases the prediction performance by integrating compositional and physicochemical properties with structural features (features are peptide characteristics) and using a support vector machine (SVM) model. Researchers proposed a tool called Deep-AmPEP30 to improve short AMP prediction using deep learning [9]. They developed the tool using an optimal feature set based on reduced amino acid composition together with a convolutional neural network. AUC for the GBM approach was ~0.98, more than 3% better than any of the other models

Data Collection and Feature Selection
Ensemble Gradient Boosting Model
Results and Discussion
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.