Abstract

Virtual screening is one of the most common computer-aided drug design techniques that apply computational tools and methods on large libraries of molecules to extract the drugs. Ensemble learning is a recent paradigm launched to improve machine learning results in terms of predictive performance and robustness. It has been successfully applied in ligand-based virtual screening (LBVS) approaches. Applying ensemble learning on huge molecular libraries is computationally expensive. Hence, the distribution and parallelisation of the task have become a significant step by using sophisticated frameworks such as Apache Spark. In this paper, we propose a new approach HEnsL_DLBVS, for heterogeneous ensemble learning, distributed on Spark to improve the large-scale LBVS results. To handle the problem of imbalanced big training datasets, we propose a novel hybrid technique. We generate new training datasets to evaluate the approach. Experimental results confirm the effectiveness of our approach with satisfactory accuracy and its superiority over homogeneous models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call