Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Transposon mutagenesis in bacterial natural product discovery.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Transposon mutagenesis has re-emerged as a powerful and versatile strategy for discovering and characterising specialised metabolites encoded by biosynthetic gene clusters (BGCs). While genomics has revealed an enormous diversity of putative BGCs across bacteria, many remain silent, weakly expressed, or genetically intractable, necessitating experimental tools that can link genotype to chemical output. Transposons provide an unbiased and broadly applicable platform for disrupting, activating, or modulating gene expression without relying on homologous recombination, making them particularly valuable in challenging microbial hosts. Here, we review the major applications of transposon mutagenesis in natural product discovery, providing examples that highlight discoveries made using phenotype- and bioactivity-guided screens, phenotype-independent strategies, and transposon-based engineering of heterologous expression platforms. Transposon technologies provide flexible and scalable tools for activating, characterising, and engineering microbial BGCs. As genome mining continues to unearth rich seams of unexplored metabolic potential, these tools will remain essential for converting genetic predictions into chemical discovery.

Similar Papers
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 25
  • 10.1111/1751-7915.12184
NextGen microbial natural products discovery.
  • Dec 27, 2014
  • Microbial Biotechnology
  • Claudia Schmidt‐Dannert

Small-molecule secondary metabolites isolated from microorganisms and plants provide the chemical scaffolds of a large fraction of today's pharmaceuticals. Evolutionary forces shaped the molecular complexity of these natural products that contribute to the exquisite binding of these compounds to biological targets. Starting with the discovery of penicillin by Fleming, we have seen a rapid increase in the discovery and production of natural products and derivatives thereof as antibiotics and other drugs. But once the 'easy to access' bioactive compounds have been isolated, the drug discovery pipeline slowed down beginning in the 1990s. Pharmaceutical companies turned away from natural products as screening programmes led to the rediscovery of known structures and development of structurally complex natural products into drugs using synthetic methods proved to be challenging and too expensive if no reliable biological sources were available. Considering the urgent need for the development of new drugs to combat multidrug-resistant pathogens and overcome long-term side-effects and/or reduction in effectiveness of current drugs, unlocking nature's treasure trough of small-molecule chemodiversity will be crucial for next-generation drug development (Gerwick and Moore, 2012; Basmadjian et al., 2014; Genilloud, 2014). Driven by advances in sequencing, gene synthesis, bioinformatics and metabolomics, the natural products discovery process is beginning to undergo a major transformation – away from the tedious isolation, screening and dereplication process to in silico-based bioprospecting approaches that seek to eventually transform genomic information directly into biosynthetic outputs (Lewis, 2013; Deane and Mitchell, 2014). The explosion in the number of available microbial genome sequences has given us a glance at the hidden natural product biosynthetic capacity of these organisms. Based on known sequence information for enzymes involved in synthesizing, e.g. the scaffolds of bioactive polyketides, non-ribosomal peptides or terpenes, numerous gene clusters (fungi) and operons (bacteria) can be identified in microbial genomes that are silent and for which no secondary metabolite products have been identified. This also includes many well-studied natural products producers such as Streptomyces and Aspergillus strains that express only a subset of their secondary metabolome under typical laboratory growth conditions (Brakhage, 2013; Doroghazi and Metcalf, 2013; Lim and Keller, 2014; Rebets et al., 2014). Our sequencing capacity is outpacing – by orders of magnitude – our ability to identify natural products gene cluster and most importantly, translate this sequence information into screenable molecules. The number of sequenced microbial genome sequences is rapidly approaching 5000 sequenced genomes, of which a large majority is bacterial genomes with only a few hundred fungal genomes available. With this large number of sequences available, the question becomes 'How does one most effectively search this vast sequence space for interesting natural products pathways?' One approach commonly used is to focus on a few groups of bacteria of fungi known to produce bioactive natural products and comprehensively identify within their genomes natural products biosynthetic operons or gene cluster, and then target the most diverse biosynthetic gene cluster for characterization. In many cases, products of target gene clusters are not produced at all or only at very low levels under laboratory growth conditions, requiring gene cluster activation either through exogenous stimuli or manipulation of genetic control elements which may be strain specific and a laborious undertaking. In the case that a strain is genetically tractable, gene disruption can then be used to specifically characterize biosynthetic gene functions. This 'reverse discovery' approach has been quite successfully used in genome-driven bioprospecting for a number of natural products identified in bacteria and some filamentous fungi (Lewis, 2013; Deane and Mitchell, 2014; Jensen et al., 2014). Such 'reverse discovery strategies', however, are limited to microorganisms that can be cultivated in the laboratory and that can be genetically manipulated, leaving out enormous biosynthetic diversity found in unculturable microbial species such as many higher fungi (see below) and from complex microbial ecosystems. Recent work has shown that metagenomic libraries from microbial ecosystems can be successfully arrayed and screened for large biosynthetic gene clusters of interest based on homology to conserved regions of known biosynthetic genes such non-ribosomal peptide synthases or polyketide synthases (Owen et al., 2013). Fungi have a tremendous capacity for natural products biosynthesis, yet only a relatively small fraction of its large biodiversity has been explored so far. Natural products pathways have mostly been characterized from a relatively small subset of Ascomycota, including filamentous fungi like Aspergillus, Penicillium and Fusarium that are genetically tractable and can be readily cultured in the laboratory (Lazarus et al., 2014). Basidiomycota, including the mushroom-forming fungi, have received almost no attention so far, despite the fact that they may have a quite distinct arsenal of natural products (Quin et al., 2014). Genome surveys of the few hundred genomes in Joint Genome Institute's Fungal Genomics database shows that we have barely scratched the surface of the biosynthetic potential encoded in the small number of sequences genomes that represent a minuscule fraction of the fungal diversity. Major reasons for the slow progress in characterizing the secondary metabolome of many fungi (especially many Basidiomycota) is that they are frequently hard to work with: laboratory growth may be slow or not possible and genetic tools so readily available for bacteria and filamentous fungi are largely absent. The future of natural products and drug discovery will be greatly influenced by how quickly the scientific community can develop strategies that will enable us to move away from the slow approaches for pathway identification and characterization that depend on first the growth of the producer organism and it then being genetically tractable to some extent. Instead, we should take full advantage of rapid and affordable whole genome sequencing, RNAseq and DNA synthesis where we can move rapidly from in silico biosynthetic pathway identification into a high-throughput synthetic biology workflow with the concurrent analytical profiling of heterologously assembled expression libraries. Implementation of such an in silico to natural products discovery platform begins with the accurate identification and structural annotation of biosynthetic pathways and genes in genomic data. A number of bioinformatics tools have been developed for genomic bioprospecting (Weber, 2014), but these tools rely on algorithms trains with hidden Markov models derived from known biosynthetic genes. These models need to be expanded to capture a larger biosynthetic diversity. Coding information will then be directly used to synthesize corresponding genetic constructs suitable for high-throughput pathway assembly which could be done using already existing synthetic biology methods (Cobb et al., 2014). Precise structural gene annotations will be essential for such an envisioned high-throughput synthetic biology workflow that relies on gene synthesis and assembly. From our own experience, we know that gene annotations in the genomes of many fungi are incorrect. Basidiomycota genes are very intron rich and many small intron/exons are incorrectly predicted using available models. Deep RNA sequencing of a cross-section of microbial species (fungi and bacteria) that can be grown in the lab will be crucial to develop algorithms for accurate structural annotation. High-resolution transcriptomics analysis of diverse species will enable the construction of gene co-expression networks built on physical distance to seed genes that are frequently associated with natural products biosynthetic pathways (e.g. cytochrome P450s, group transferases, transporters) could be a means for the discovery of novel pathways and sequences for broader in silico searchers. Considering that microbial secondary metabolite pathways are typically clustered and gene expression is co-regulated, such network analysis will be a powerful method for accurate delineation of biosynthetic gene clusters, including satellite clusters and split super-cluster pathways known in fungi. Finally, we may need to develop more than the common Escherichia coli and Saccharomyces cerevisiae host platforms for high-throughput refactoring and functional expression of pathways from a variety of sources to overcome for example potential co-factor, precursor limitations, product toxicity or the ability to express very large gene cluster. Considering the fast pace at which progress has and continues to be made in genomics and synthetic biology and also new methods being developed for compound screening and identification through high-resolution mass spectrometry (Krug and Muller, 2014), we should be optimistic that genomics-driven natural products drug discovery has bright future. The author's research in natural products biosynthesis has been supported by the National Institutes of Health Grant GM080299.

  • Research Article
  • Cite Count Icon 30
  • 10.1128/msystems.00489-21
Comparative Genomics Reveals a Remarkable Biosynthetic Potential of the Streptomyces Phylogenetic Lineage Associated with Rugose-Ornamented Spores
  • Aug 31, 2021
  • mSystems
  • Yoon-Hee Chung + 7 more

ABSTRACTThe genus Streptomyces is one of the richest sources of secondary metabolite biosynthetic gene clusters (BGCs). Sequencing of a large number of genomes has provided evidence that this well-known bacterial genus still harbors a large number of cryptic BGCs, and their metabolites are yet to be discovered. When taking a gene-first approach for new natural product discovery, BGC prioritization would be the most crucial step for the discovery of novel chemotypes. We hypothesized that strains with a greater number of BGCs would also contain a greater number of silent unique BGCs due to the presence of complex regulatory systems. Based on this hypothesis, we employed a comparative genomics approach to identify a specific Streptomyces phylogenetic lineage with the highest and yet-uncharacterized biosynthetic potential. A comparison of BGC abundance and genome size across 158 phylogenetically diverse Streptomyces type strains identified that members of the phylogenetic group characterized by the formation of rugose-ornamented spores possess the greatest number of BGCs (average, 50 BGCs) and also the largest genomes (average, 11.5 Mb). The study of genetic and biosynthetic diversities using comparative genomics of 11 sequenced genomes and a genetic similarity network analysis of BGCs suggested that members of this group carry a large number of unique BGCs, the majority of which are cryptic and not associated with any known natural product. We believe that members of this Streptomyces phylogenetic group possess a remarkable biosynthetic potential and thus would be a good target for a metabolite characterization study that could lead to the discovery of novel chemotypes.IMPORTANCE It is now well recognized that members of the genus Streptomyces still harbor a large number of cryptic BGCs in their genomes, which are mostly silent under laboratory culture conditions. Activation of transcriptionally silent BGCs is technically challenging and thus forms a bottleneck when taking a gene-first approach for the discovery of new natural products. Thus, it is important to focus activation efforts on strains with BGCs that have the potential to produce novel metabolites. The clade-level analysis of biosynthetic diversity could provide insights into the relationship between phylogenetic lineage and biosynthetic diversity. By exploring BGC abundance in relation to Streptomyces phylogeny, we identified a specific monophyletic lineage associated with the highest BGC abundance. Then, using a combined analysis of comparative genomics and a genetic network, we demonstrated that members of this lineage are genetically and biosynthetically diverse, contain a large number of cryptic BGCs with novel genotypes, and thus would be a good target for metabolite characterization studies.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 49
  • 10.3390/microorganisms7060181
Survey of Biosynthetic Gene Clusters from Sequenced Myxobacteria Reveals Unexplored Biosynthetic Potential.
  • Jun 24, 2019
  • Microorganisms
  • Katherine Gregory + 4 more

Coinciding with the increase in sequenced bacteria, mining of bacterial genomes for biosynthetic gene clusters (BGCs) has become a critical component of natural product discovery. The order Myxococcales, a reputable source of biologically active secondary metabolites, spans three suborders which all include natural product producing representatives. Utilizing the BiG-SCAPE-CORASON platform to generate a sequence similarity network that contains 994 BGCs from 36 sequenced myxobacteria deposited in the antiSMASH database, a total of 843 BGCs with lower than 75% similarity scores to characterized clusters within the MIBiG database are presented. This survey provides the biosynthetic diversity of these BGCs and an assessment of the predicted chemical space yet to be discovered. Considering the mere snapshot of myxobacteria included in this analysis, these untapped BGCs exemplify the potential for natural product discovery from myxobacteria.

  • Research Article
  • 10.1248/yakushi.21-00218
Discovery of Diverse Natural Products from Undeveloped Fungal Gene Resource by Using Epigenetic Regulation
  • May 1, 2022
  • Yakugaku zasshi : Journal of the Pharmaceutical Society of Japan
  • Teigo Asai

Discovery of natural products that possess novel chemical structures and pharmaceutical activities increases opportunities of drug development. Filamentous fungi have been recognized as an attractive source for pharmaceutically beneficial natural products. Genome sequencing innovation represented by Next-generation sequencer opened fungal genomes one after another, suggesting that one fungal strain has far more biosynthetic gene clusters than that are estimated from the number of previously isolated natural products. In addition, bioinformatics analyses have indicated that most biosynthetic gene clusters are silent under laboratory culture conditions and there are a huge number of natural products hidden in the fungal genome. Therefore, we focused on those silent biosynthetic gene clusters as a potential source for novel natural products and developed methods to activate silent biosynthetic gene clusters by using low molecular weight molecules. In this review, we describe on discovery of novel natural products through activating fungal silent biosynthesis by addition of epigenetic modifiers and plant hormones.

  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.synbio.2023.03.001
Cost-effective hybrid long-short read assembly delineates alternative GC-rich Streptomyces hosts for natural product discovery
  • Mar 10, 2023
  • Synthetic and systems biotechnology
  • Elena Heng + 11 more

With the advent of rapid automated in silico identification of biosynthetic gene clusters (BGCs), genomics presents vast opportunities to accelerate natural product (NP) discovery. However, prolific NP producers, Streptomyces, are exceptionally GC-rich (>80%) and highly repetitive within BGCs. These pose challenges in sequencing and high-quality genome assembly which are currently circumvented via intensive sequencing. Here, we outline a more cost-effective workflow using multiplex Illumina and Oxford Nanopore sequencing with hybrid long-short read assembly algorithms to generate high quality genomes. Our protocol involves subjecting long read-derived assemblies to up to 4 rounds of polishing with short reads to yield accurate BGC predictions. We successfully sequenced and assembled 8 GC-rich Streptomyces genomes whose lengths range from 7.1 to 12.1 Mb with a median N50 of 8.2 Mb. Taxonomic analysis revealed previous misrepresentation among these strains and allowed us to propose a potentially new species, Streptomyces sydneybrenneri. Further comprehensive characterization of their biosynthetic, pan-genomic and antibiotic resistance features especially for molecules derived from type I polyketide synthase (PKS) BGCs reflected their potential as alternative NP hosts. Thus, the genome assemblies and insights presented here are envisioned to serve as gateway for the scientific community to expand their avenues in NP discovery.

  • Research Article
  • 10.1055/s-0034-1382743
Genomics-guided discovery of potent anticancer natural products from exotic bacterial species
  • Jul 14, 2014
  • Planta Medica
  • Yqe Cheng

Genomics-guided natural product discovery is effective and particularly suitable for small research laboratories with very limited resources. Mining the genome of Burkholderia thailandensis E264 revealed a hybrid nonribosomal peptide synthetase-polyketide synthase (NRPS-PKS) biosynthetic gene cluster that resembles that of FK228 (romidepsin, Istodax®), which led us to discover thailandepsins A-F, natural analogues of FK228, and potent histone deacetylase inhibitors and antiproliferative agents with GI50 values in the sub-nM range. Mining the genome of B. thailandensis MSMB43 revealed at least 13 biosynthetic gene clusters. Among them one hybrid NRPS-PKS gene cluster is highly homologous to that of FR901464 (a prototype spliceosome inhibitor) in Pseudomonas sp. No. 2663, which led us to discover thailanstatins A-D, natural and more stable analogues of FR901464, and potent pre-mRNA splicing inhibitors and antiproliferative agents with GI50 values in the low nM range. Selected members of thailandepsins or thailanstatins are under intensive investigations as anticancer drug candidates, and preliminary results are very promising. Collectively, Burkholderia species have proven to be a very good source of potent natural products.

  • Research Article
  • 10.1016/bs.mie.2025.03.005
Targeted genome mining for natural product discovery.
  • Jan 1, 2025
  • Methods in enzymology
  • José D D Cediel-Becerra + 1 more

Targeted genome mining for natural product discovery.

  • Research Article
  • Cite Count Icon 3
  • 10.1186/s12934-025-02722-z
A highly efficient heterologous expression platform to facilitate the production of microbial natural products in Streptomyces
  • May 14, 2025
  • Microbial Cell Factories
  • Xiuling Wang + 14 more

BackgroundHeterologous expression in Streptomyces provides a platform for mining natural products (NPs) encoded by cryptic biosynthetic gene clusters (BGCs) of bacteria. The BGCs are first engineered in hosts with robust recombineering systems, such as Escherichia coli, followed by expression in optimized heterologous hosts, such as Streptomyces, with defined metabolic backgrounds.ResultsWe developed a highly efficient heterologous expression platform, named Micro-HEP (microbial heterologous expression platform), that uses versatile E. coli strains capable of both modification and conjugation transfer of foreign BGCs and optimized chassis Streptomyces strain for expression. The stability of repeat sequences in these E. coli strains was superior to that of the commonly used conjugative transfer system E. coli ET12567 (pUZ8002). For optimizing expression of foreign BGCs, the chassis strain S. coelicolor A3(2)-2023 was generated by deleting four endogenous BGCs followed by introducing multiple recombinase-mediated cassette exchange (RMCE) sites in the S. coelicolor A3(2) chromosome. Additionally, modular RMCE cassettes (Cre-lox, Vika-vox, Dre-rox, and phiBT1-attP) were constructed for integrating BGCs into the chassis strain. Micro-HEP was tested using BGCs for the anti-fibrotic compound xiamenmycin and griseorhodins. Two to four copies of the xim BGC were integrated by RMCE, with increasing copy number associated with increasing yield of xiamenmycin. The grh BGC was also efficiently expressed, and the new compound griseorhodin H was identified.ConclusionWe demonstrated that our Micro-HEP system enables the efficient expression of foreign BGCs, facilitating the discovery of new NPs and increasing yields.Graphical

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 35
  • 10.3389/fmicb.2023.1271418
Species-specificity of the secondary biosynthetic potential in Bacillus
  • Oct 23, 2023
  • Frontiers in Microbiology
  • Qun-Jian Yin + 7 more

IntroductionAlthough Bacillus species have produced a wide variety of structurally diverse and biologically active natural products, the secondary biosynthetic potential of Bacillus species is widely underestimated due to the limited number of biosynthetic gene clusters (BGCs) in this genus. The significant variation in the diversity and novelty of BGCs across different species within the Bacillus genus presents a major obstacle to the efficient discovery of novel natural products from Bacillus.MethodsIn this study, the number of each class of BGCs in all 6,378 high-quality Bacillus genomes was predicted using antiSMASH, the species-specificity of BGC distribution in Bacillus was investigated by Principal component analysis. Then the structural diversity and novelty of the predicted secondary metabolites in Bacillus species with specific BGC distributions were analyzed using molecular networking.ResultsOur results revealed a certain degree of species-specificity in the distribution of BGCs in Bacillus, which was mainly contributed by siderophore, type III polyketide synthase (T3PKS), and transAT-PKS BGCs. B. wiedmannii, B. thuringiensis, and B. cereus are rich in RiPP-like and siderophore BGCs, but lack T3PKS BGCs, while B. amyloliquefaciens and B. velezensis are abundant in transAT-PKS BGCs. These Bacillus species collectively encode 77,541 BGCs, with NRPS and RiPPs being the two most dominant types, which are further categorized into 4,291 GCFs. Remarkably, approximately 54.5% of GCFs and 93.8% of the predicted metabolite scaffolds are found exclusively in a single Bacillus species. Notably, B. cereus, B. thuringiensis, and B. velezensis exhibit the highest potential for producing species-specific NRPS and PKS bioinformatic natural products. Taking two species-specific NRPS gene clusters as examples, the potential of Bacillus to synthesize novel species-specific natural products is illustrated.ConclusionThis study highlights the species-specificity of the secondary biosynthetic potential in Bacillus and provides valuable insights for the targeted discovery of novel natural products from this genus.

  • Book Chapter
  • Cite Count Icon 11
  • 10.1016/b978-0-12-409547-2.14627-x
Genome Mining Approaches to Bacterial Natural Product Discovery
  • Jun 3, 2019
  • Reference Module in Chemistry, Molecular Sciences and Chemical Engineering
  • Nadine Ziemert + 2 more

Genome Mining Approaches to Bacterial Natural Product Discovery

  • Research Article
  • Cite Count Icon 17
  • 10.1016/j.engmic.2022.100060
Next-generation synthetic biology approaches for the accelerated discovery of microbial natural products
  • Nov 19, 2022
  • Engineering Microbiology
  • Lei Li

Next-generation synthetic biology approaches for the accelerated discovery of microbial natural products

  • Research Article
  • Cite Count Icon 72
  • 10.1016/j.fgb.2016.01.006
Computational strategies for genome-based natural product discovery and engineering in fungi
  • Jan 13, 2016
  • Fungal Genetics and Biology
  • Theo A.J Van Der Lee + 1 more

Fungal natural products possess biological activities that are of great value to medicine, agriculture and manufacturing. Recent metagenomic studies accentuate the vastness of fungal taxonomic diversity, and the accompanying specialized metabolic diversity offers a great and still largely untapped resource for natural product discovery. Although fungal natural products show an impressive variation in chemical structures and biological activities, their biosynthetic pathways share a number of key characteristics. First, genes encoding successive steps of a biosynthetic pathway tend to be located adjacently on the chromosome in biosynthetic gene clusters (BGCs). Second, these BGCs are often are located on specific regions of the genome and show a discontinuous distribution among evolutionarily related species and isolates. Third, the same enzyme (super)families are often involved in the production of widely different compounds. Fourth, genes that function in the same pathway are often co-regulated, and therefore co-expressed across various growth conditions. In this mini-review, we describe how these partly interlinked characteristics can be exploited to computationally identify BGCs in fungal genomes and to connect them to their products. Particular attention will be given to novel algorithms to identify unusual classes of BGCs, as well as integrative pan-genomic approaches that use a combination of genomic and metabolomic data for parallelized natural product discovery across multiple strains. Such novel technologies will not only expedite the natural product discovery process, but will also allow the assembly of a high-quality toolbox for the re-design or even de novo design of biosynthetic pathways using synthetic biology approaches.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 14
  • 10.3390/md19030142
Coupling Mass Spectral and Genomic Information to Improve Bacterial Natural Product Discovery Workflows.
  • Mar 5, 2021
  • Marine Drugs
  • Max Crüsemann

Bacterial natural products possess potent bioactivities and high structural diversity and are typically encoded in biosynthetic gene clusters. Traditional natural product discovery approaches rely on UV- and bioassay-guided fractionation and are limited in terms of dereplication. Recent advances in mass spectrometry, sequencing and bioinformatics have led to large-scale accumulation of genomic and mass spectral data that is increasingly used for signature-based or correlation-based mass spectrometry genome mining approaches that enable rapid linking of metabolomic and genomic information to accelerate and rationalize natural product discovery. In this mini-review, these approaches are presented, and discovery examples provided. Finally, future opportunities and challenges for paired omics-based natural products discovery workflows are discussed.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 33
  • 10.1128/msystems.00208-17
LuxRHomolog-Linked Biosynthetic Gene Clusters inProteobacteria
  • Mar 27, 2018
  • mSystems
  • Carolyn A Brotherton + 2 more

Microbes are a major source of antibiotics, pharmaceuticals, and other bioactive compounds. The production of many specialized microbial metabolites is encoded in biosynthetic gene clusters (BGCs). A challenge associated with natural product discovery is that many BGCs are not expressed under laboratory growth conditions. Here we report a genome-mining approach to discover BGCs with luxR-type quorum sensing (QS) genes, which code for regulatory proteins that control gene expression. Our results show that BGCs linked to genes coding for LuxR-like proteins are widespread in Proteobacteria. In addition, we show that associations between luxR homolog genes and BGCs have evolved independently many times, with functionally diverse gene clusters. Overall, these clusters may provide a source of new natural products for which there is some understanding about how to elicit production. IMPORTANCE Bacteria biosynthesize specialized metabolites with a variety of ecological functions, including defense against other microbes. Genes that code for specialized metabolite biosynthetic enzymes are frequently clustered together. These BGCs are often regulated by a transcription factor encoded within the cluster itself. These pathway-specific regulators respond to a signal or indirectly through other means of environmental sensing. Many specialized metabolites are not produced under laboratory growth conditions, and one reason for this issue is that laboratory growth media lack environmental cues necessary for BGC expression. Here, we report a bioinformatics study that reveals that BGCs are frequently linked to genes coding for LuxR family QS-responsive transcription factors in the phylum Proteobacteria. The products of these luxR homolog-associated gene clusters may serve as a practical source of bioactive metabolites.

  • Research Article
  • 10.1038/s41598-026-49955-5
Unified genomic and chemical representations enable bidirectional biosynthetic gene cluster and natural product retrieval.
  • May 9, 2026
  • Scientific reports
  • Guimei Liu + 7 more

Natural product discovery is increasingly driven by the ability to analyze microbial genomes for biosynthetic gene clusters (BGCs) that encode secondary metabolites. While existing approaches have successfully linked BGCs to broad classes of chemical products, they typically operate in a single modality (genomic or chemical) limiting the scope of bidirectional prediction. In this work, we propose a multimodal framework that integrates genomic and chemical information by projecting embeddings derived from pretrained language models into a common representation space. We embed genomic sequences using a BGC foundation model and represent molecules through a chemical language model, then use a metric learning model to co-embed BGCs and their associated chemical structures. This co-embedding space allows us to quantify the similarity between BGCs and compounds using similarity measures, enabling both efficient forward and inverse retrieval tasks. Our approach consistently outperforms the non-alignment approach and represents a generalizable, scalable strategy to bridge biological and chemical modalities in natural product discovery.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant