Abstract

Microbial natural product discovery programs face two main challenges today: rapidly prioritizing strains for discovering new molecules and avoiding the rediscovery of already known molecules. Typically, these problems have been tackled using biological assays to identify promising strains and techniques that model variance in a dataset such as PCA to highlight novel chemistry. While these tools have shown successful outcomes in the past, datasets are becoming much larger and require a new approach. Since PCA models are dependent on the members of the group being modeled, large datasets with many members make it difficult to accurately model the variance in the data. Our tool, hcapca, first groups strains based on the similarity of their chemical composition, and then applies PCA to the smaller sub-groups yielding more robust PCA models. This allows for scalable chemical comparisons among hundreds of strains with thousands of molecular features. As a proof of concept, we applied our open-source tool to a dataset with 1046 LCMS profiles of marine invertebrate associated bacteria and discovered three new analogs of an established anticancer agent from one promising strain.

Highlights

  • Natural product drug discovery programs continue to provide new and bio-medically relevant pharmacophores [1]

  • We have previously demonstrated that liquid chromatography mass spectroscopy (LCMS)-based metabolomics help to partly address this problem, we found that there are fundamental limits to scaling these methods [6,7]

  • We have demonstrated that hcapca is able to leverage this property by being able to differentiate the A1901 metabolome from all other samples in its subgroup leading to the discovery of the lomaiviticin analogs

Read more

Summary

Introduction

Natural product drug discovery programs continue to provide new and bio-medically relevant pharmacophores [1]. Many natural products discovery programs rely heavily on collecting source organisms from diverse ecological niches in an attempt to harness the biological and chemical diversity stemming from these living systems. Tools to effectively survey sources of natural products prior to employing their chemistry for drug discovery are critical for effective discovery programs. There are no good tools to handle large LCMS-based untargeted metabolomics datasets that aligned with drug discovery goals. To meet this need we developed a tool called hcapca that enables rapid assessment of chemical diversity using low cost LCMS-based untargeted metabolomics

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call