Scaffold analysis of PubChem database as background for hierarchical scaffold-based visualization

Jakub Velkoborsky,David Hoksza

doi:10.1186/s13321-016-0186-7

Abstract

BackgroundVisualization of large molecular datasets is a challenging yet important topic utilised in diverse fields of chemistry ranging from material engineering to drug design. Especially in drug design, modern methods of high-throughput screening generate large amounts of molecular data that call for methods enabling their analysis. One such method is classification of compounds based on their molecular scaffolds, a concept widely used by medicinal chemists to group molecules of similar properties. This classification can then be utilized for intuitive visualization of compounds.ResultsIn this paper, we propose a scaffold hierarchy as a result of large-scale analysis of the PubChem Compound database. The analysis not only provided insights into scaffold diversity of the PubChem Compound database, but also enables scaffold-based hierarchical visualization of user compound data sets on the background of empirical chemical space, as defined by the PubChem data, or on the background of any other user-defined data set. The visualization is performed by a web based client-server application called Scaffvis. It provides an interactive zoomable tree map visualization of data sets up to hundreds of thousands molecules. Scaffvis is free to use and its source codes have been published under an open source license.Graphical abstract. Electronic supplementary materialThe online version of this article (doi:10.1186/s13321-016-0186-7) contains supplementary material, which is available to authorized users.

Highlights

Visualization of large molecular datasets is a challenging yet important topic utilised in diverse fields of chemistry ranging from material engineering to drug design
To be able to visualize a molecular dataset with different levels of detail with respect to the molecular scaffolds, we need a set of scaffold definitions forming a hierarchy, optimally a tree hierarchy
Scaffold visualizer In the previous section we described the generator and results it yields when run against the PubChem Compound set

Summary

Introduction

Visualization of large molecular datasets is a challenging yet important topic utilised in diverse fields of chemistry ranging from material engineering to drug design. In drug design, modern methods of highthroughput screening generate large amounts of molecular data that call for methods enabling their analysis. One such method is classification of compounds based on their molecular scaffolds, a concept widely used by medicinal chemists to group molecules of similar properties. With the growing sizes of existing chemical libraries it is becoming increasingly important to be able to explore and analyze those libraries to gain insight into the their composition. For this purpose, visualization is an indispensable tool.

Methods

Results

Conclusion