USING SIMPLE RULES ON PRESENCE AND POSITIONING OF MOTIFS FOR PROMOTER STRUCTURE MODELING AND TISSUE-SPECIFIC EXPRESSION PREDICTION

Alexis Vandenbon,Kenta Nakai

doi:10.1142/9781848163324_0016

Abstract

Regulation of transcription is controlled by sets of transcription factors binding specific sites in the regulatory regions of genes. It is therefore believed that regulatory regions driving similar expression profiles share some common structural features. We here introduce a computational approach for finding a small set of rules describing the presence and positioning of motifs in a set of promoter sequences. This rule set is subsequently used for finding promoters that drive similar expression profiles from a genomic set of sequences. We applied our approach on muscle-expressed genes in Caenorhabditis elegans. We obtained a high average performance, and in the best case we found that almost 50% of true positive test genes scored higher than 90% of the true negative test genes. High scoring non-training sequences were enriched for muscle-expressed genes, and predicted motifs fitting the rules showed a significant tendency to be present in experimentally verified regulatory regions. Our model is more general than existing cis-regulatory module models, as rules selected by our model contain a variety of information, including not only proximal but also distal positioning of pairs of motifs, positioning with regard to the translation start site, and simply presences of motifs. We believe our model can help to increase our understanding about transcription factor cooperation and transcription initiation.

Full Text