Abstract

A large part of our current understanding of gene regulation in Gram-positive bacteria is based on Bacillus subtilis , as it is one of the most well studied bacterial model systems. The rapid growth in data concerning its molecular and genomic biology is distributed across multiple annotation resources. Consequently, the interpretation of data from further B. subtilis experiments becomes increasingly challenging in both low- and large-scale analyses. Additionally, B. subtilis annotation of structured RNA and non-coding RNA (ncRNA), as well as the operon structure, is still lagging behind the annotation of the coding sequences. To address these challenges, we created the B. subtilis genome atlas, BSGatlas, which integrates and unifies multiple existing annotation resources. Compared to any of the individual resources, the BSGatlas contains twice as many ncRNAs, while improving the positional annotation for 70 % of the ncRNAs. Furthermore, we combined known transcription start and termination sites with lists of known co-transcribed gene sets to create a comprehensive transcript map. The combination with transcription start/termination site annotations resulted in 717 new sets of co-transcribed genes and 5335 untranslated regions (UTRs). In comparison to existing resources, the number of 5′ and 3′ UTRs increased nearly fivefold, and the number of internal UTRs doubled. The transcript map is organized in 2266 operons, which provides transcriptional annotation for 92 % of all genes in the genome compared to the at most 82 % by previous resources. We predicted an off-target-aware genome-wide library of CRISPR–Cas9 guide RNAs, which we also linked to polycistronic operons. We provide the BSGatlas in multiple forms: as a website (https://rth.dk/resources/bsgatlas/), an annotation hub for display in the UCSC genome browser, supplementary tables and standardized GFF3 format, which can be used in large scale -omics studies. By complementing existing resources, the BSGatlas supports analyses of the B. subtilis genome and its molecular biology with respect to not only non-coding genes but also genome-wide transcriptional relationships of all genes.

Highlights

  • Bacillus subtilis (Firmicutes, Bacilli) is a Gram-p­ositive soil micro-­organism that is central for multiple research fields

  • The current annotations of B. subtilis focus on protein coding sequences (CDSs) [8], insofar that many genome coordinate annotations exist for structured RNA elements, non-c­ oding RNA genes and untranslated regions (UTRs) of mRNAs, these annotations are challenging to access; in particular for high-­throughput access

  • Given the non-­bacterial hits, we excluded the relaxed scan from the merging, yet we provide it as supplementary information for more putative non-c­ oding RNA (ncRNA) candidates

Read more

Summary

Introduction

Bacillus subtilis (Firmicutes, Bacilli) is a Gram-p­ositive soil micro-­organism that is central for multiple research fields. It is widely used as a model system for the study of. Microbial Genomics 2021 gene regulation and it is probably the best-­studied bacterial species apart from Escherichia coli. In industrial applications, it is used as a host organism for the production of enzymes and other proteins [1]. The current annotations of B. subtilis focus on protein coding sequences (CDSs) [8], insofar that many genome coordinate annotations exist for structured RNA elements, non-c­ oding RNA (ncRNA) genes and untranslated regions (UTRs) of mRNAs, these annotations are challenging to access; in particular for high-­throughput access. Full mRNA transcripts are rarely annotated, which constrains the study of post-­transcriptional regulation

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call