Large-Scale Pretraining Improves Sample Efficiency of Active Learning-Based Virtual Screening.

Zhonglin Cao,Simone Sciabola,Ye Wang

doi:10.1021/acs.jcim.3c01938

Abstract

Virtual screening of large compound libraries to identify potential hit candidates is one of the earliest steps in drug discovery. As the size of commercially available compound collections grows exponentially to the scale of billions, active learning and Bayesian optimization have recently been proven as effective methods of narrowing down the search space. An essential component of those methods is a surrogate machine learning model that predicts the desired properties of compounds. An accurate model can achieve high sample efficiency by finding hits with only a fraction of the entire library being virtually screened. In this study, we examined the performance of a pretrained transformer-based language model and graph neural network in a Bayesian optimization active learning framework. The best pretrained model identifies 58.97% of the top-50,000 compounds after screening only 0.6% of an ultralarge library containing 99.5 million compounds, improving 8% over the previous state-of-the-art baseline. Through extensive benchmarks, we show that the superior performance of pretrained models persists in both structure-based and ligand-based drug discovery. Pretrained models can serve as a boost to the accuracy and sample efficiency of active learning-based virtual screening.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Large-Scale Pretraining Improves Sample Efficiency of Active Learning-Based Virtual Screening.

Abstract

Talk to us

Similar Papers

More From: Journal of chemical information and modeling

Lead the way for us

Similar Papers

Function and structure-based screening of compounds, peptides and proteins to identify drug candidates.
Vidhi Malik ... Jaspreet Kaur Dhanjal
Methods | VOL. 131
Vidhi Malik, et. al.Vidhi Malik ... Jaspreet Kaur Dhanjal
24 Aug 2017
Methods | VOL. 131

Investigating Pre-trained Language Models on Cross-Domain Datasets, a Step Closer to General AI
Mohamad Ballout ... Kai-Uwe Kühnberger
Procedia Computer Science | VOL. 222
Mohamad Ballout, et. al.Mohamad Ballout ... Kai-Uwe Kühnberger
01 Jan 2023
Procedia Computer Science | VOL. 222

MSP: Multi-Stage Prompting for Making Pre-trained Language Models Better Translators
Zhixing Tan ... Yang Liu
-
Zhixing Tan, et. al.Zhixing Tan ... Yang Liu
01 Jan 2021
01 Jan 2021

Research on the Application of Prompt Learning Pretrained Language Model in Machine Translation Task with Reinforcement Learning
Canjun Wang ... Zhengyu Ju
Electronics | VOL. 12
Canjun Wang, et. al.Canjun Wang ... Zhengyu Ju
09 Aug 2023
Electronics | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Large-Scale Pretraining Improves Sample Efficiency of Active Learning-Based Virtual Screening.

Abstract

Talk to us

Similar Papers

More From: Journal of chemical information and modeling