Abstract

A computational methodology is introduced for detecting all unique series of analogs in large compound data sets, regardless of chemical relationships between analogs. No prior knowledge of core structures or R-groups is required, which are automatically determined. The approach is based upon the generation of retrosynthetic matched molecular pairs and analog networks from which distinct series are isolated. The methodology was applied to systematically extract more than 17 000 distinct series from the ChEMBL database. For comparison, analog series were also isolated from screening compounds and drugs. Known biological activities were mapped to series from ChEMBL, and in more than 13 000 of these series, key compounds were identified that represented substitution sites of all analogs within a series and its complete activity profile. The analog series, key compounds, and activity profiles are made freely available as a resource for medicinal chemistry applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call