Abstract

Bioactive molecular compounds are essential for drug discovery. The biological activity of these compounds needs to be predicted as this is used to determine the drug-target ability. As ineffective drugs are discarded after production, leading to resource and time wastage, it is important to predict bioactive molecules with models having high predictive performance. This study utilizes the stacked ensemble which uses the prediction of multiple base classifiers as features, used to train a meta classifier which makes the final prediction. Using three datasets DS1, DS2, and DS3 gotten from MDL Drug Data Report (MDDR) database, the performance of stacked ensemble was compared to three other ensembles: adaboost, bagging, and vote ensemble, based on different evaluation criteria and also a statistical method, Kendall's W test. The accuracy of Stacked ensemble ranged from 96.7002%, 98.2260% and 94.9007% for the three datasets respectively, although Vote had the best accuracy using dataset DS2 which consist of structurally homogeneous bioactive molecules. Also, using Kendall's W test to rank the ensembles, Stacked ensemble was ranked best with datasets DS1 and DS3, with both having a mean average of 4.00 and an overall level of agreement, W, of 0.986 and 1.000 respectively. Using dataset DS2, it was ranked after Vote and Adaboost with mean average of 2.33 and an overall level of agreement, W of 0.857. Stacked ensemble is recommended for the prediction of heterogeneous bioactive molecules during drug discovery and can also be implemented in other research areas.

Highlights

  • Bioactive molecular compounds are substances with positive effects on living organisms

  • The introduction of ensembles to prediction has allowed models to combine the performance ability of more than one classifier to improve the performance of the overall model

  • For ensembles requiring more than one base classifier, such as Stacked, and Vote ensemble, Support Vector Machine, k-Nearest Neighbor, Decision Tree, and Random Forest were used as the base classifier, while Random Forest being a classifier with high performance was used for Adaboost and Bagging ensembles and as the meta-classifier for stacked ensemble

Read more

Summary

INTRODUCTION

Bioactive molecular compounds are substances with positive effects on living organisms. They have toxicological and pharmacological effects on humans and animals. These compounds can either be found in plants, fruits, nuts, or synthesized. Various compounds exist in a plant, including bioactive compounds. These may be present in any part of the plant and they need to be extracted for further use [1]. Nopal cactus or prickly pear fruit is a plant which has been used in both traditional medicine, and for its bioactive compounds properties [4]. Ensembles can be likened to the productivity which emanates as a result of teamwork compared to a lone work

RELATED WORK
RESULTS AND DISCUSSION
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.