Ensemble Learning for Large Scale Virtual Screening on Apache Spark

Karima Sid,Mohamed Batouche

doi:10.1007/978-3-319-89743-1_22

Abstract

Virtual screening (VS) is an in-silico tool for drug discovery that aims to identify the candidate drugs through computational techniques by screening large libraries of small molecules. Various ligand and structure-based virtual screening approaches have been proposed in the last decades. Machine learning (ML) techniques have been widely applied in drug discovery and development process, predominantly in ligand based virtual screening approaches. Ensemble learning is a very common paradigm in ML field, where many models are trained on the same problem’s data, to combine in the end the results in one improved prediction. Applying VS to massive molecular libraries (Big Data) is computationally intensive; so the split of these data to chunks to parallelize and distribute the task became necessary. For many years, MapReduce has been successfully applied on clusters to solve the problems with very large datasets, but with some limitations. Apache Spark is an open source framework for Big Data processing, which overcomes the shortcomings of MapReduce. In this paper, we propose a new approach based on ensemble learning paradigm in Apache Spark to improve in terms of execution time and precision the large-scale virtual screening. We generate a new training dataset to evaluate our approach. The experimental results show a good predictive performance up to 92% precision with an acceptable execution time.

Highlights

The discovering of new drug is a very expensive and long process
The results proved that Apache Spark is a very powerful tool for Big Data machine learning
We have presented a new approach based on ensemble learning paradigm and Apache Spark to enhance the performance of large-scale virtual screening process

Summary

Introduction

The discovering of new drug is a very expensive and long process. With the very fast increase in the size of these libraries, HTS will be expensive and provides a small number of hits with a high false positive and false-negative rate [1, 2]. Virtual Screening (VS) is a pre-screening technique, cheaper and faster than HTS, successfully applied to decrease (filter) the number of compounds to be screened by generating new drug leads [2, 3]. There are two strategies for Virtual Screening: Ligand based (LBVS) and Structure based (SBVS) [3]. In LBVS, the existing information about the ligands is used to find compounds that best match a given query; this strategy can work in the absence of structural information of the target [4]. In SBVS strategy, the structural information of the target (generally proteins) is required [4]

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Ensemble Learning for Large Scale Virtual Screening on Apache Spark

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2018
Citations: 2	License type: cc-by

Similar Papers

Big data processing frameworks and architectures: a survey
Raghavendra Kumar Chunduri ... Aswani Kumar Cherukuri
-
Raghavendra Kumar Chunduri, et. al.Raghavendra Kumar Chunduri ... Aswani Kumar Cherukuri
07 Jul 2021
07 Jul 2021

A Survey of Scheduling Tasks in Big Data: Apache Spark
Balqees Talal Hasan ... Dhuha Basheer Abdullah
-
Balqees Talal Hasan, et. al.Balqees Talal Hasan ... Dhuha Basheer Abdullah
01 Jan 2021
01 Jan 2021

Large Scale Distributed Data Science using Apache Spark
James G Shanahan ... Laing Dai
-
James G Shanahan, et. al.James G Shanahan ... Laing Dai
10 Aug 2015
10 Aug 2015

Identification of a dual TAOK1 and MAP4K5 inhibitor using a structure-based virtual screening approach
Min-Wu Chao ... Kai-Cheng Hsu
Journal of Enzyme Inhibition and Medicinal Chemistry | VOL. 36
Min-Wu Chao, et. al.Min-Wu Chao ... Kai-Cheng Hsu
09 Nov 2020
Journal of Enzyme Inhibition and Medicinal Chemistry | VOL. 36

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Ensemble Learning for Large Scale Virtual Screening on Apache Spark

Abstract

Highlights

Summary

Talk to us

Similar Papers