Abstract

We present a novel approach called ChemMaps for visualizing chemical space based on the similarity matrix of compound datasets generated with molecular fingerprints' similarity. The method uses a 'satellites' approach, where satellites are, in principle, molecules whose similarity to the rest of the molecules in the database provides sufficient information for generating a visualization of the chemical space. Such an approach could help make chemical space visualizations more efficient. We hereby describe a proof-of-principle application of the method to various databases that have different diversity measures. Unsurprisingly, we found the method works better with databases that have low 2D diversity. 3D diversity played a secondary role, although it seems to be more relevant as 2D diversity increases. For less diverse datasets, taking as few as 25% satellites seems to be sufficient for a fair depiction of the chemical space. We propose to iteratively increase the satellites number by a factor of 5% relative to the whole database, and stop when the new and the prior chemical space correlate highly. This Research Note represents a first exploratory step, prior to the full application of this method for several datasets.

Highlights

  • Visual representation of chemical space has multiple implications in drug discovery for virtual screening, library design and comparison of compound collections, among others[1]

  • In representation of the chemical space based on principal component analysis (PCA) there have been “chemical satellite” approaches, such as ChemGPS, which select satellites molecules that might not be included in the database to visualize, but have extreme features that place them as outliers, with the intention to reach as much of the chemical space as possible[7,8,9,10]

  • We conclude that for datasets with lower 2D diversity (CREBBP and L3MBTL3, see Table 1), around 25% of satellite compounds are enough to obtain a high correlation (≥ 0.9) with the gold standard (e.g., PCA on the whole matrix), whereas for 2D-diverse datasets i.e., DNMT1 and SMARCA2, up to 75% of the compounds could be needed to ensure a high correlation

Read more

Summary

Introduction

Visual representation of chemical space has multiple implications in drug discovery for virtual screening, library design and comparison of compound collections, among others[1]. Amongst the multiple methods to explore chemical space, principal component analysis (PCA) of pairwise similarity matrices computed with structural fingerprints has been used to analyze compound datasets[2,3]. Other approaches use molecular representations different from structural fingerprints, such as physicochemical properties or complexity descriptors, or methods different from PCA, such as multidimensional-scaling and neural networks[5,6]. In representation of the chemical space based on PCA there have been “chemical satellite” approaches, such as ChemGPS, which select satellites molecules that might not be included in the database to visualize, but have extreme features that place them as outliers, with the intention to reach as much of the chemical space as possible[7,8,9,10]. A related and more recent approach, Similarity Mapplet, makes possible the visualization of very large chemical libraries, by considering PCA of different molecular features, including structural[11]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.