Computational Method for the Systematic Identification of Analog Series and Key Compounds Representing Series and Their Biological Activity Profiles.

Dagmar Stumpfe,Dilyana Dimova,Jürgen Bajorath

doi:10.1021/acs.jmedchem.6b00906

Abstract

A computational methodology is introduced for detecting all unique series of analogs in large compound data sets, regardless of chemical relationships between analogs. No prior knowledge of core structures or R-groups is required, which are automatically determined. The approach is based upon the generation of retrosynthetic matched molecular pairs and analog networks from which distinct series are isolated. The methodology was applied to systematically extract more than 17 000 distinct series from the ChEMBL database. For comparison, analog series were also isolated from screening compounds and drugs. Known biological activities were mapped to series from ChEMBL, and in more than 13 000 of these series, key compounds were identified that represented substitution sites of all analogs within a series and its complete activity profile. The analog series, key compounds, and activity profiles are made freely available as a resource for medicinal chemistry applications.

Full Text