Abstract

Pfizer Global Virtual Library (PGVL) of 10(13) readily synthesizable molecules offers a tremendous opportunity for lead optimization and scaffold hopping in drug discovery projects. However, mining into a chemical space of this size presents a challenge for the concomitant design informatics due to the fact that standard molecular similarity searches against a collection of explicit molecules cannot be utilized, since no chemical information system could create and manage more than 10(8) explicit molecules. Nevertheless, by accepting a tolerable level of false negatives in search results, we were able to bypass the need for full 10(13) enumeration and enabled the efficient similarity search and retrieval into this huge chemical space for practical usage by medicinal chemists. In this report, two search methods (LEAP1 and LEAP2) are presented. The first method uses PGVL reaction knowledge to disassemble the incoming search query molecule into a set of reactants and then uses reactant-level similarities into actual available starting materials to focus on a much smaller sub-region of the full virtual library compound space. This sub-region is then explicitly enumerated and searched via a standard similarity method using the original query molecule. The second method uses a fuzzy mapping onto candidate reactions and does not require exact disassembly of the incoming query molecule. Instead Basis Products (or capped reactants) are mapped into the query molecule and the resultant asymmetric similarity scores are used to prioritize the corresponding reactions and reactant sets. All sets of Basis Products are inherently indexed to specific reactions and specific starting materials. This again allows focusing on a much smaller sub-region for explicit enumeration and subsequent standard product-level similarity search. A set of validation studies were conducted. The results have shown that the level of false negatives for the disassembly-based method is acceptable when the query molecule can be recognized for exact disassembly, and the fuzzy reaction mapping method based on Basis Products has an even better performance in terms of lower false-negative rate because it is not limited by the requirement that the query molecule needs to be recognized by any disassembly algorithm. Both search methods have been implemented and accessed through a powerful desktop molecular design tool (see ref. (33) for details). The chapter will end with a comparison of published search methods against large virtual chemical space.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call