Teaching an Old Dog New Tricks: Strategies That Improve Early Recognition in Similarity-Based Virtual Screening.

Ruifeng Liu,Mohamed Diwan M Abdulhameed,Anders Wallqvist

doi:10.3389/fchem.2019.00701

Abstract

High throughput screening (HTS) is an important component of lead discovery, with virtual screening playing an increasingly important role. Both methods typically suffer from lack of sensitivity and specificity against their true biological targets. With ever-increasing screening libraries and virtual compound collections, it is now feasible to conduct follow-up experimental testing on only a small fraction of hits. In this context, advances in virtual screening that achieve enrichment of true actives among top-ranked compounds (“early recognition”) and, hence, reduce the number of hits to test, are highly desirable. The standard ligand-based virtual screening method for large compound libraries uses a molecular similarity search method that ranks the likelihood of a compound to be active against a drug target by its highest Tanimoto similarity to known active compounds. This approach assumes that the distributions of Tanimoto similarity values to all active compounds are identical (i.e., same mean and standard deviation)—an assumption shown to be invalid (Baldi and Nasr, 2010). Here, we introduce two methods that improve early recognition of actives by exploiting similarity information of all molecules. The first method ranks a compound by its highest z-score instead of its highest Tanimoto similarity, and the second by an aggregated score calculated from its Tanimoto similarity values to all known actives and inactives (or a large number of structurally diverse molecules when information on inactives is unavailable). Our evaluations, which use datasets of over 20 HTS campaigns downloaded from PubChem, indicate that compared to the conventional approach, both methods achieve a ~10% higher Boltzmann-enhanced discrimination of receiver operating characteristic (BEDROC) score—a metric of early recognition. Given the increasing use of virtual screening in early lead discovery, these methods provide straightforward means to enhance early recognition.

Highlights

Lead discovery by high throughput screening (HTS) is often described as a process akin to finding a needle in a haystack (Aherne et al, 2002)
Using HTS data, we demonstrate that, on average, the Boltzmann-enhanced discrimination of receiver operating characteristic (BEDROC) values derived from both methods are about 10% higher than those derived from the Max-Sim scoring method
The result for one dataset, NPC1, was an outlier, as the difference was as high as 170%, and the mean difference in BEDROC values decreased to 8.7% when it was excluded

Summary

Introduction

Lead discovery by high throughput screening (HTS) is often described as a process akin to finding a needle in a haystack (Aherne et al, 2002). The number of chemicals available for bioactivity testing has increased exponentially over the past decade. Despite the increased screening capacity, it remains impractical to assay a significant fraction of all available chemicals. The most widely used virtual screening methods are based on molecular similarity searches (Kristensen et al, 2013). These approaches typically rank molecules in a chemical library based on their structural similarity to a set of molecules known to be active against a desired target. Chemicals ranked high on the list can be acquired and tested for the desired activity or property

Methods

Results

Conclusion