Abstract

PubChem is a scientific showcase of the NIH Roadmap Initiatives. It is a compound repository created to facilitate information exchange and data sharing among the NIH Roadmap-funded Molecular Library Screening Center Network (MLSCN) and the scientific community. However, PubChem has more than 10 million records of compound information. It will be challenging to conduct a drug screening of the whole database of millions of compounds. Thus, the purpose of the present study was to develop a data mining cheminformatics approach in order to construct a representative and structure-diverse sublibrary from the large PubChem database. In this study, a new chemical diverse representative subset, rePubChem, was selected by whole-molecule chemistry-space matrix calculation using the cell-based partition algorithm. The representative subset was generated and was then subjected to evaluations by compound property analyses based on 1D and 2D molecular descriptors. The new subset was also examined and assessed for self-similarity analysis based on 2D molecular fingerprints in comparing with the source compound library. The new subset has a much smaller library size (540K compounds) with minimum similarity and redundancy without loss of the structural diversity and basic molecular properties of its parent library (5.3 million compounds). The new representative subset library generated could be a valuable structure-diverse compound resource for in silico virtual screening and in vitro HTS drug screening. In addition, the established subset generation method of using the combined cell-based chemistry-space partition metrics with pairwised 2D fingerprint-based similarity search approaches will also be important to a broad scientific community interested in acquiring structurally diverse compounds for efficient drug screening, building representative virtual combinatorial chemistry libraries for syntheses, and data mining large compound databases like the PubChem library in general.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call