Abstract

The analysis of structure-activity relationships (SARs) becomes rather challenging when large and heterogeneous compound data sets are studied. In such cases, many different compounds and their activities need to be compared, which quickly goes beyond the capacity of subjective assessments. For a comprehensive large-scale exploration of SARs, computational analysis and visualization methods are required. Herein, we introduce a two-layered SAR visualization scheme specifically designed for increasingly large compound data sets. The approach combines a new compound pair-based variant of generative topographic mapping (GTM), a machine learning approach for nonlinear mapping, with chemical space networks (CSNs). The GTM component provides a global view of the activity landscapes of large compound data sets, in which informative local SAR environments are identified, augmented by a numerical SAR scoring scheme. Prioritized local SAR regions are then projected into CSNs that resolve these regions at the level of individual compounds and their relationships. Analysis of CSNs makes it possible to distinguish between regions having different SAR characteristics and select compound subsets that are rich in SAR information.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call