In the past, several methods have been developed for predicting antibacterial and antimicrobial peptides, but only limited attempts have been made to predict their minimum inhibitory concentration (MIC) values. In this study, we developed predictive models for MIC values of antibacterial peptides against Escherichia coli (E. coli), comprised of 3143 peptides for training and 786 peptides for validation, with experimentally determined MIC values. We found that the Composition Enhanced Transition and Distribution (CeTD) attributes significantly correlate with MIC values. Initially, we attempted to estimate MIC using BLAST similarity searches but found them inadequate. Subsequently, we employed machine learning regression models that integrated various features, including peptide composition, binary profiles and embeddings from large language models. Feature selection techniques, particularly mRMR, were utilized to refine our model inputs. Our Random Forest regressor built using default parameters achieved a correlation coefficient (R) of 0.78, R2 of 0.59, and RMSE of 0.53 on the validation set. Our best model outperformed existing methods when benchmarked on an independent dataset of 498 anti-E. coli peptides. Additionally, we screened anti-E. coli proteins in the proteomes of three probiotic bacterial strains and created a web-based platform, “EIPpred”, enabling users to design peptides with desired MIC values (https://webs.iiitd.edu.in/raghava/eippred).
Read full abstract