Abstract

Promoters are genomic regions where the transcription machinery binds to initiate the transcription of specific genes. Computational tools for identifying bacterial promoters have been around for decades. However, most of these tools were designed to recognize promoters in one or few bacterial species. Here, we present Promotech, a machine-learning-based method for promoter recognition in a wide range of bacterial species. We compare Promotech’s performance with the performance of five other promoter prediction methods. Promotech outperforms these other programs in terms of area under the precision-recall curve (AUPRC) or precision at the same level of recall. Promotech is available at https://github.com/BioinformaticsLabAtMUN/PromoTech.

Highlights

  • Promoters are DNA segments essential for the initiation of transcription at a defined location in the genome, which are recognized by a specific RNA polymerase (RNAP) holoenzyme (Eσ ) [1]

  • Variety of training and validation data We obtained a large amount of promoter sequences from published global transcription start site (TSS) maps. On both the training and the validation data, we had bacterial species belonging to distinct phyla and having a wide range of GC content (Tables 1 and 2)

  • To have visual representations of these results, each nucleotide’s importance score was plotted on a bar graph (Figs. 2 and 3). These results suggest that having adenine (A) and thymine (T) in the range of − 8 to − 12 relative to the TSS is highly important for promoter recognition

Read more

Summary

Introduction

Promoters are DNA segments essential for the initiation of transcription at a defined location in the genome, which are recognized by a specific RNA polymerase (RNAP) holoenzyme (Eσ ) [1]. Σ factors are bacterial DNA-binding regulatory proteins of transcription initiation that enable specific binding of RNAP to promoters [1]. There have been numerous bioinformatics tools developed to recognize bacterial promoter sequences [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17] (summarized in Supplementary Table S1). The performance of current tools rapidly decreases when applied to whole genomes, and it is common practice to restrict the size of the input sequence to a few hundred nucleotides

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call