Efficient generation, storage, and manipulation of fully flexible pharmacophore multiplets and their use in 3-D similarity searching.

Edmond Abrahamian,Henning Thøgersen,Lars Nærum,Robert D Clark,Inge Thøger Christensen,Peter C Fox

doi:10.1021/ci025595r

Edmond Abrahamian, Henning Thøgersen + Show 4 more

Open Access

https://doi.org/10.1021/ci025595r

Copy DOI

Abstract

Pharmacophore triplets and quartets have been used by many groups in recent years, primarily as a tool for molecular diversity analysis. In most cases, slow processing speeds and the very large size of the bitsets generated have forced researchers to compromise in terms of how such multiplets were stored, manipulated, and compared, e.g., by using simple unions to represent multiplets for sets of molecules. Here we report using bitmaps in place of bitsets to reduce storage demands and to improve processing speed. Here, a bitset is taken to mean a fully enumerated string of zeros and ones, from which a compressed bitmap is obtained by replacing uniform blocks ("runs") of digits in the bitset with a pair of values identifying the content and length of the block (run-length encoding compression). High-resolution multiplets involving four features are enabled by using 64 bit executables to create and manipulate bitmaps, which "connect" to the 32 bit executables used for database access and feature identification via an extensible mark-up language (XML) data stream. The encoding system used supports simple pairs, triplets, and quartets; multiplets in which a privileged substructure is used as an anchor point; and augmented multiplets in which an additional vertex is added to represent a contingent feature such as a hydrogen bond extension point linked to a complementary feature (e.g., a donor or an acceptor atom) in a base pair or triplet. It can readily be extended to larger, more complex multiplets as well. Database searching is one particular potential application for this technology. Consensus bitmaps built up from active ligands identified in preliminary screening can be used to generate hypothesis bitmaps, a process which includes allowance for differential weighting to allow greater emphasis to be placed on bits arising from multiplets expected to be particularly discriminating. Such hypothesis bitmaps are shown to be useful queries for database searching, successfully retrieving active compounds across a range of structural classes from a corporate database. The current implementation allows multiconformer bitmaps to be obtained from pregenerated conformations or by random perturbation on-the-fly. The latter application involves random sampling of the full range of conformations not precluded by steric clashes, which limits the usefulness of classical fingerprint similarity measures. A new measure of similarity, The Stochastic Cosine, is introduced here to address this need. This new similarity measure uses the average number of bits common to independently drawn conformer sets to normalize the cosine coefficient. Its use frees the user from having to ensure strict comparability of starting conformations and having to use fixed torsional increments, thereby allowing fully flexible characterization of pharmacophoric patterns.

Full Text