Background: Phage therapy has a broad application prospect as a novel therapeutic method, and Phage Virion Proteins (PVP) can recognize the host and bind to surface receptors, which is of great significance for the development of antimicrobial drugs for the treatment of infectious diseases caused by bacteria. In recent years, several PVP predictors based on machine learning have been developed, which usually use a single feature to train the learner. In contrast, higher dimensional feature representations tend to contain more potential sequence information. Methods: In this work, we construct a stacking model PredPVP for PVP prediction by combining multiple features and using feature selection methods. Specifically, the sequence is first encoded using seven features. For this high-dimensional feature representation, three feature selection methods wereutilized to remove redundant features, then integrated with eight machine learning algorithms. Finally, probability features and class features (PCFs) generated by 24 base models were put into logistic regression (LR) to train the model. Results: The results of the independent test set indicate that PredPVP has higher performance compared to other existing predictors, with an AUC of 93.4%. Conclusion: We expect PredPVP to be used as a tool for large-scale PVP recognition, providing a new way for the development of novel antimicrobials and accelerating its application in actual treatment. The datasets and source codes used in this study are available at https://github.com/caoqian23/PredPVP.
Read full abstract