Deep Sequence Models for Ligand-Based Virtual Screening

P N Pournami,Viswajit Vinod Nair,G Gopakumar,Vaishnavi Sudheer Nair,P B Jayaraj,Sonaal Pathlai Pradeep

doi:10.1142/s2737416522500107

Abstract

The past few years have witnessed machine learning techniques take the limelight in multiple research domains. One such domain that has reaped the benefits of machine learning is computer-aided drug discovery, where the search space for candidate drug molecules is decreased using methods such as virtual screening. Current state-of-the-art sequential neural network models have shown promising results and we would like to replicate similar results with virtual screening using the encoded molecular information known as simplified molecular-input line-entry system (SMILES). Our work includes the use of attention-based sequential models — the long short-term memory with attention and an optimized version of the transformer network specifically designed to deal with SMILES (ChemBERTa). We also propose the “Overall Screening Efficacy”, an averaging metric that aggregates and encapsulates the model performance over multiple datasets. We found an overall improvement of about [Formula: see text] over the benchmark model, which relied on parallelized random forests.

Full Text