The extensive bioactivity data available in public databases, such as ChEMBL, has facilitated in-depth structure-activity relationship (SAR) analysis, which are essential for understanding the impact of molecular modifications on biological activity in a comprehensive manner. A central strategy in SAR analysis is the assessment of molecular similarity. Several approaches preferred by medicinal chemists have been developed to efficiently capture structurally related compounds on a large scale. Represented as a popular molecular editing strategy in hit-to-lead and lead optimization processes, we previously introduced four types of single-atom modifications (SAMs) as chemical similarity criterion and conducted a systematic analysis of their application in compound design. In this study, we expanded the analysis to cover 10 common SAMs, including carbon-nitrogen (N↔C), O↔C, N↔O, S↔O, as well as simpler modifications such as OH↔H, CH3↔H, and halogen-hydrogen (F, Cl, Br, I↔H) exchanges. Leveraging high-confidence bioactivity data from ChEMBL (version 34), we assembled a comprehensive dataset comprising 374,979 SAM pairs. Following an evaluation of the frequency of these SAM types in medicinal chemistry efforts, we focused on SAM-induced activity cliffs (ACs), yielding over 7400 ACs, substantially expanding the current knowledgebase of ACs associated with single-atom changes. Furthermore, structural analysis of these ACs, supported by experimental data, provides critical insights into the role of single-atom modifications in modulating compound activity, offering practical guidance for the structure-based optimization of molecular properties in drug development. As a result, we are providing open access to all identified ACs along with their associated structural information.
Read full abstract