A data mining approach is introduced that automatically extracts SAR information from high-throughput screening data sets and that helps to select active compounds for chemical exploration and hit-to-lead projects. SAR pathways are systematically identified consisting of sequences of similar active compounds with gradual increases in potency. Fully enumerated SAR pathway sets are subjected to pathway scoring, filtering, and mining, and pathways with the most significant SAR information content are prioritized. High-scoring SAR pathways often reveal activity cliffs contained in screening data. Subsets of SAR pathways are analyzed in SAR trees that make it possible to identify microenvironments of significant SAR discontinuity from which hits are preferentially selected. SAR trees of alternative pathways leading to activity cliffs identify key compounds and help to develop chemically intuitive SAR hypotheses.
Read full abstract