Stochastic similarity selections from large combinatorial libraries

Victor S Lobanov,Dimitris K Agrafiotis

doi:10.1021/ci990109u

Abstract

A stochastic procedure for similarity searching in large virtual combinatorial libraries is presented. The method avoids explicit enumeration and calculation of descriptors for every virtual compound, yet provides an optimal or nearly optimal similarity selection in a reasonable time frame. It is based on the principle of probability sampling and the recognition that each reagent is represented in a combinatorial library by multiple products. The method proceeds in three stages. First, a small fraction of the products is selected at random and ranked according to their similarity against the query structure. The top-ranking compounds are then identified and deconvoluted into a list of "preferred" reagents. Finally, all the cross-products of these preferred reagents are enumerated in an exhaustive manner, and systematically compared to the target to obtain the final selection. This procedure has been applied to produce similarity selections from several virtual combinatorial libraries, and the dependency of the quality of the selections on several selection parameters has been analyzed.

Full Text