Abstract

Studying analog series to find structural transformations that enhance the activity and ADME properties of lead compounds is an important part of drug development. Matched molecular pair (MMP) search is a powerful tool for analog analysis that imitates researchers' ability to select pairs of compounds that differ only by small well-defined transformations. Abstraction is a challenge for existing MMP search algorithms, which can result in the omission of relevant, inexact MMPs, and inclusion of irrelevant, contextually dissimilar MMPs. In this work, we present a new method for MMP search that returns approximate results and enables flexible control over abstraction of contextual information. We illustrate the concepts and mechanics of our method with a series of exemplar MMP queries, and then benchmark search accuracy using MMPs found by fragment indexing. We show that we can search for MMPs in a context dependent manner, and accurately approximate context independent fragment index based MMP search over a range of fingerprint and dataset conditions. Our method can be used to search for pairwise correspondences among analog sets and bolster MMP datasets where data is missing or incomplete.

Highlights

  • Matched molecular pair (MMP) are a useful tool to study analog relationships and local QSAR, but current MMP search methods are brittle compared to intuitive notions of what constitutes a matched analog pair

  • Efficient index based search methods enforce precisely defined context independent transformations that can miss near MMPs relevant to an analysis

  • Previous iterations of vector based MMP search enforce strict context dependence and feature set coupling that can fail to group together transformations occurring in different contexts

Read more

Summary

Introduction

Successful optimization of lead compounds requires the iterative application of structural modifications that yield favorable changes in target activity profiles and ADMET properties. This process has been the sole domain of medicinal chemistry teams. Researchers assembled analog series data by hand, guided by their knowledge of compounds that had been synthesized and tested within their organization They would generate hypotheses using techniques such as FreeWilson analysis [1], Hansch analysis [2], Topliss schemes [3], and Craig plots [4]; and combine data driven insights with expertise to prioritize compounds for synthesis and testing in each design iteration. The challenge has driven innovation in search and index of analog sets in chemical libraries

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call