Abstract
Studying analog series to find structural transformations that enhance the activity and ADME properties of lead compounds is an important part of drug development. Matched molecular pair (MMP) search is a powerful tool for analog analysis that imitates researchers' ability to select pairs of compounds that differ only by small well-defined transformations. Abstraction is a challenge for existing MMP search algorithms, which can result in the omission of relevant, inexact MMPs, and inclusion of irrelevant, contextually dissimilar MMPs. In this work, we present a new method for MMP search that returns approximate results and enables flexible control over abstraction of contextual information. We illustrate the concepts and mechanics of our method with a series of exemplar MMP queries, and then benchmark search accuracy using MMPs found by fragment indexing. We show that we can search for MMPs in a context dependent manner, and accurately approximate context independent fragment index based MMP search over a range of fingerprint and dataset conditions. Our method can be used to search for pairwise correspondences among analog sets and bolster MMP datasets where data is missing or incomplete.
Highlights
Matched molecular pair (MMP) are a useful tool to study analog relationships and local QSAR, but current MMP search methods are brittle compared to intuitive notions of what constitutes a matched analog pair
Efficient index based search methods enforce precisely defined context independent transformations that can miss near MMPs relevant to an analysis
Previous iterations of vector based MMP search enforce strict context dependence and feature set coupling that can fail to group together transformations occurring in different contexts
Summary
Successful optimization of lead compounds requires the iterative application of structural modifications that yield favorable changes in target activity profiles and ADMET properties. This process has been the sole domain of medicinal chemistry teams. Researchers assembled analog series data by hand, guided by their knowledge of compounds that had been synthesized and tested within their organization They would generate hypotheses using techniques such as FreeWilson analysis [1], Hansch analysis [2], Topliss schemes [3], and Craig plots [4]; and combine data driven insights with expertise to prioritize compounds for synthesis and testing in each design iteration. The challenge has driven innovation in search and index of analog sets in chemical libraries
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Computational and Structural Biotechnology Journal
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.