Abstract

Identifying small molecule protein–protein interaction modulators (PPIMs) is a highly promising and meaningful research direction for drug discovery, cancer treatment, and other fields. In this study, we developed a stacking ensemble computational framework, SELPPI, based on a genetic algorithm and tree-based machine learning method for effectively predicting new modulators targeting protein–protein interactions. More specifically, extremely randomized trees (ExtraTrees), adaptive boosting (AdaBoost), random forest (RF), cascade forest, light gradient boosting machine (LightGBM), and extreme gradient boosting (XGBoost) were used as basic learners. Seven types of chemical descriptors were taken as the input characteristic parameters. Primary predictions were obtained with each basic learner–descriptor pair. Then, the 6 methods mentioned above were used as meta learners and trained on the primary prediction in turn. The most efficient method was utilized as the meta learner. Finally, the genetic algorithm was used to select the optimal primary prediction output as the input of the meta learner for secondary prediction to obtain the final result. We systematically evaluated our model on the pdCSM-PPI datasets. To our knowledge, our model outperformed all existing models, which demonstrates its great power.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.