Abstract
Many drugs are derived from small molecules produced by microorganisms and plants, so-called natural products. Natural products have diverse chemical structures, but the biosynthetic pathways producing those compounds are often organized as biosynthetic gene clusters (BGCs) and follow a highly conserved biosynthetic logic. This allows for the identification of core biosynthetic enzymes using genome mining strategies that are based on the sequence similarity of the involved enzymes/genes. However, mining for a variety of BGCs quickly approaches a complexity level where manual analyses are no longer possible and require the use of automated genome mining pipelines, such as the antiSMASH software. In this review, we discuss the principles underlying the predictions of antiSMASH and other tools and provide practical advice for their application. Furthermore, we discuss important caveats such as rule-based BGC detection, sequence and annotation quality and cluster boundary prediction, which all have to be considered while planning for, performing and analyzing the results of genome mining studies.
Highlights
Most antibiotics, such as penicillin, erythromycin or tetracycline, and other drugs like acarbose, artemisinin, tacrolimus or cyclosporins are so-called natural products either synthesized by or derived from microorganisms or plants [1]
Natural products have diverse chemical structures, but the biosynthetic pathways producing those compounds are often organized as biosynthetic gene clusters (BGCs) and follow a highly conserved biosynthetic logic
We focus on antiSMASH as an example, the issues discussed are applicable to natural product genome mining in general, and are relevant when using other tools
Summary
Most antibiotics, such as penicillin, erythromycin or tetracycline, and other drugs like acarbose (anti-diabetic), artemisinin (anti-malarial), tacrolimus or cyclosporins (immunosuppressants) are so-called natural products either synthesized by or derived from microorganisms or plants [1]. To predict secondary metabolite biosynthesis pathways, genome mining approaches commonly start out by identifying conserved biosynthetic genes Their gene products are subsequently analyzed to gain information about their putative function in biosynthesis and sometimes their substrate specificity. The most commonly used tool around pHMMs in biology is HMMer [46] Many profile databases such as PFAM [47] and TIGRFAMs [48] provide downloadable profiles compatible with HMMer. antiSMASH uses pHMMs with profiles specific to conserved core enzymes of secondary metabolite biosynthesis pathways to run its profilebased BGC detection. For BGCs encoding NRPS, PKS, terpene or ribosomally synthesized and posttranslationally modified peptides (RiPPs), it is possible to perform some additional analyses to predict further details, such as substrate specificities or product cyclization patterns To this end, it is sometimes necessary to classify proteins or domains that share a high overall sequence similarity. ClusterBlast contains a comprehensive database of all predicted BGCs from publicly available genomes that is
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.