SIGffRid: A tool to search for sigma factor binding sites in bacterial genomes using comparative approach and biologically driven statistics

Fabrice Touzain,Bertrand Aigle,Sophie Schbath,Gregory Kucherov,Isabelle Debled-Rennesson,Pierre Leblond

doi:10.1186/1471-2105-9-73

Abstract

BackgroundMany programs have been developed to identify transcription factor binding sites. However, most of them are not able to infer two-word motifs with variable spacer lengths. This case is encountered for RNA polymerase Sigma (σ) Factor Binding Sites (SFBSs) usually composed of two boxes, called -35 and -10 in reference to the transcription initiation point. Our goal is to design an algorithm detecting SFBS by using combinational and statistical constraints deduced from biological observations.ResultsWe describe a new approach to identify SFBSs by comparing two related bacterial genomes. The method, named SIGffRid (SIGma Factor binding sites Finder using R'MES to select Input Data), performs a simultaneous analysis of pairs of promoter regions of orthologous genes. SIGffRid uses a prior identification of over-represented patterns in whole genomes as selection criteria for potential -35 and -10 boxes. These patterns are then grouped using pairs of short seeds (of which one is possibly gapped), allowing a variable-length spacer between them. Next, the motifs are extended guided by statistical considerations, a feature that ensures a selection of motifs with statistically relevant properties. We applied our method to the pair of related bacterial genomes of Streptomyces coelicolor and Streptomyces avermitilis. Cross-check with the well-defined SFBSs of the SigR regulon in S. coelicolor is detailed, validating the algorithm. SFBSs for HrdB and BldN were also found; and the results suggested some new targets for these σ factors. In addition, consensus motifs for BldD and new SFBSs binding sites were defined, overlapping previously proposed consensuses. Relevant tests were carried out also on bacteria with moderate GC content (i.e. Escherichia coli/Salmonella typhimurium and Bacillus subtilis/Bacillus licheniformis pairs). Motifs of house-keeping σ factors were found as well as other SFBSs such as that of SigW in Bacillus strains.ConclusionWe demonstrate that our approach combining statistical and biological criteria was successful to predict SFBSs. The method versatility autorizes the recognition of other kinds of two-box regulatory sites.

Highlights

IntroductionMost of them are not able to infer two-word motifs with variable spacer lengths
Many programs have been developed to identify transcription factor binding sites
We know that Sigma (σ) Factor Binding Sites (SFBSs) occurrences are rare in a genome, because useless occurrences of SFBSs can represent a handicap for the bacterium which has to overcome the pressure of selection

Summary

Introduction

Most of them are not able to infer two-word motifs with variable spacer lengths This case is encountered for RNA polymerase Sigma (σ) Factor Binding Sites (SFBSs) usually composed of two boxes, called -35 and -10 in reference to the transcription initiation point. Many algorithms are devoted to single motifs prediction [2,3,4,5,6,7,8,9,10,11] They include genetic algorithm [10], expectation maximization or Gibb sampling methods [2,5,7], with incorporated phylogeny data [11], or other methods often based on multiple alignments [4,6] or statistical over-representation [12] and can identify some kinds of TFBSs, but these approaches are not adapted to regulatory binding sites composed of two boxes (a box refers to a conserved part of a signal modelled by a word). This characteristic, is not tackled by most of the existing methods, such as the popular MEME program [2]

Objectives

Results

Discussion

Conclusion