Abstract

Background: Natural product databases are important in drug discovery and other research areas. An analysis of its structural content, as well as functional group occurrence, provides a useful overview, as well as a means of comparison with related databases. BIOFACQUIM is an emerging database of natural products characterized and isolated in Mexico. Herein, we discuss the results of a first systematic functional group analysis and global diversity of an updated version of BIOFACQUIM. Methods: BIOFACQUIM was augmented through a literature search and data curation. A structural content analysis of the dataset was performed. This involved a functional group analysis with a novel algorithm to automatically identify all functional groups in a molecule and an assessment of the global diversity using consensus diversity plots. To this end, BIOFACQUIM was compared to two major and large databases: ChEMBL 25, and a herein assembled collection of natural products with 169,839 unique compounds. Results: The structural content analysis showed that 15.7% of compounds and 11.6% of scaffolds present in the current version of BIOFACQUIM have not been reported in the other large reference datasets. It also gave a diversity increase in terms of scaffolds and molecular fingerprints regarding the previous version of the dataset, as well as a higher similarity to the assembled collection of natural products than to ChEMBL 25, in terms of diversity and frequent functional groups. Conclusions: A total of 148 natural products were added to BIOFACQUIM, which meant a diversity increase in terms of scaffolds and fingerprints. Regardless of its relatively small size, there are a significant number of compounds and scaffolds that are not present in the reference datasets, showing that curated databases of natural products, such as BIOFACQUIM, can serve as a starting point to increase the biologically relevant chemical space.

Highlights

  • Natural product databases are important in drug discovery and other research areas

  • Database overlap To assess the chemical space not covered by ChEMBL and natural products (NPs) but by BIOFACQUIM, we characterized the structural content of the three datasets in terms of unique compounds, scaffolds and functional groups and determined the overlap among them

  • It should be noted that despite its small size in comparison with ChEMBL and NPs, 15.7% of compounds present in BIOFACQUIM have not been reported in those other major datasets, as well as 11.6% of its scaffolds

Read more

Summary

Introduction

Natural product databases are important in drug discovery and other research areas. An analysis of its structural content, as well as functional group occurrence, provides a useful overview, as well as a means of comparison with related databases. This involved a functional group analysis with a novel algorithm to automatically identify all functional groups in a molecule and an assessment of the global diversity using consensus diversity plots To this end, BIOFACQUIM was compared to two major and large databases: ChEMBL 25, and a assembled collection of natural products with 169,839 unique compounds. Results: The structural content analysis showed that 15.7% of compounds and 11.6% of scaffolds present in the current version of BIOFACQUIM have not been reported in the other large reference datasets It gave a diversity increase in terms of scaffolds and molecular fingerprints regarding the previous version of the dataset, as well as a higher similarity to the assembled collection of natural products than to ChEMBL 25, in terms of diversity and frequent functional groups. Regardless of its relatively small size, there are a significant number of compounds and scaffolds that are not present in the reference datasets, showing that curated databases of natural products, such as BIOFACQUIM, can serve as a starting point to increase the biologically relevant chemical space

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call