Application of Unsupervised Ligand Clustering in Computer‐aided Drug Discovery

Mengzhi Fu,Josh Beckham

doi:10.1096/fasebj.2021.35.s1.01796

Abstract

Virtual screening has greatly sped up the drug discovery process by incorporating computational techniques into the pipeline of drug development. Despite multiple enhancements in the past decades on screening algorithms and strategies, the efficiency of virtual screening programs often requires extensive amounts of computational power to search over a million of compounds for drug candidates. To combat this issue, most studies use a filter to exclude all compounds that do not fulfill certain chemical properties such as Lipinski's rule of five. This is not ideal as molecules that do not exhibit typical chemical properties of drugs may still be valuable. This study aims to incorporate unsupervised clustering methods such as K-Means as the first-pass screening strategy of a large molecular library. Since most large compound libraries often have multiple entries of highly similar or identical compounds, CPU time can be saved by clustering these compounds and taking samples from each cluster for screening. Clusters that are unlikely to contain drug leads can be skipped instead of requiring an unnecessary screening of each compound in the cluster. Initial tests of this approach showed a 4-fold increase in sensitivity when screening 30 million compounds with a less than 1-fold extra CPU time overhead.

Full Text