Abstract

Independent component analysis (ICA) of bacterial transcriptomes has emerged as a powerful tool for obtaining co-regulated, independently-modulated gene sets (iModulons), inferring their activities across a range of conditions, and enabling their association to known genetic regulators. By grouping and analyzing genes based on observations from big data alone, iModulons can provide a novel perspective into how the composition of the transcriptome adapts to environmental conditions. Here, we present iModulonDB (imodulondb.org), a knowledgebase of prokaryotic transcriptional regulation computed from high-quality transcriptomic datasets using ICA. Users select an organism from the home page and then search or browse the curated iModulons that make up its transcriptome. Each iModulon and gene has its own interactive dashboard, featuring plots and tables with clickable, hoverable, and downloadable features. This site enhances research by presenting scientists of all backgrounds with co-expressed gene sets and their activity levels, which lead to improved understanding of regulator-gene relationships, discovery of transcription factors, and the elucidation of unexpected relationships between conditions and genetic regulatory activity. The current release of iModulonDB covers three organisms (Escherichia coli, Staphylococcus aureus and Bacillus subtilis) with 204 iModulons, and can be expanded to cover many additional organisms.

Highlights

  • The transcriptional regulatory network (TRN) governs gene expression in response to environmental stimuli, which is of fundamental interest in biology

  • We developed iModulonDB to enhance the field of microbial genetic regulation by presenting TRNs based on observed signals in transcriptomic datasets

  • We hope that iModulonDB will become an important part of the database ecosystem, providing a machine learning-derived perspective that links to other databases for synergetic TRN characterization

Read more

Summary

Introduction

The transcriptional regulatory network (TRN) governs gene expression in response to environmental stimuli, which is of fundamental interest in biology. The falling price of RNA sequencing has led to a rapid growth in online transcriptomic databases [8,9], creating a strong need for the development of analytical tools that can harness its scale to transform raw data into biologically meaningful information [10] For transcriptomic data, this knowledge comes in the form of (i) identifying which regulons are active in each condition probed in the dataset, (ii) generating hypotheses about gene function and regulation and (iii) revealing novel relationships and patterns in bacterial lifestyles. Traditional methods such as chromatin immunoprecipitation (ChIP) assays [11], can be time-consuming and expensive, making them cumbersome for high-throughput discovery or hypothesis generation They do not yield the conditionspecific strength of binding, which can be inferred by machine learning. Another strength of data-driven approaches is that they can be applied to any organism, regardless of

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call