De-Novo Discovery of Differentially Abundant Transcription Factor Binding Sites Including Their Positional Preference

Jens Keilwagen,Ivo Grosse,Stefan Posch,Jan Grau,Ivan A Paponov,Marc Strickert,Harmen J Bussemaker

doi:10.1371/journal.pcbi.1001070

Abstract

Transcription factors are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in promoters. The de-novo discovery of transcription factor binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not been fully solved yet. Here, we present a de-novo motif discovery tool called Dispom for finding differentially abundant transcription factor binding sites that models existing positional preferences of binding sites and adjusts the length of the motif in the learning process. Evaluating Dispom, we find that its prediction performance is superior to existing tools for de-novo motif discovery for 18 benchmark data sets with planted binding sites, and for a metazoan compendium based on experimental data from micro-array, ChIP-chip, ChIP-DSL, and DamID as well as Gene Ontology data. Finally, we apply Dispom to find binding sites differentially abundant in promoters of auxin-responsive genes extracted from Arabidopsis thaliana microarray data, and we find a motif that can be interpreted as a refined auxin responsive element predominately positioned in the 250-bp region upstream of the transcription start site. Using an independent data set of auxin-responsive genes, we find in genome-wide predictions that the refined motif is more specific for auxin-responsive genes than the canonical auxin-responsive element. In general, Dispom can be used to find differentially abundant motifs in sequences of any origin. However, the positional distribution learned by Dispom is especially beneficial if all sequences are aligned to some anchor point like the transcription start site in case of promoter sequences. We demonstrate that the combination of searching for differentially abundant motifs and inferring a position distribution from the data is beneficial for de-novo motif discovery. Hence, we make the tool freely available as a component of the open-source Java framework Jstacs and as a stand-alone application at http://www.jstacs.de/index.php/Dispom.

Highlights

Gene regulation is a complex process controlled by many influential components such as the binding of proteins to DNA or the binding of miRNAs to mRNA, RNA editing, splicing of premRNA, mRNA degradation, or post-translational modification
Gene regulation and the binding of transcription factors (TFs) to their binding sites (BSs) is of fundamental interest in many areas of genome biology
On the one hand searching for differentially abundant motifs, and on the other hand learning a position distribution have been shown to be promising in several experiments separately

Summary

Introduction

Gene regulation is a complex process controlled by many influential components such as the binding of proteins to DNA or the binding of miRNAs to mRNA, RNA editing, splicing of premRNA, mRNA degradation, or post-translational modification. A wealth of de-novo motif discovery tools has been developed over the last decades including, for example, Gibbs Sampler [9,10,11], MEME [12], Weeder [13], Improbizer [14], DME [15], DEME [16], or A-GLAM [17] These tools differ by the learning principle employed to infer the model parameters and by their capability of learning the position distribution of the BSs from the data. None of the existing tools is capable of searching for differentially abundant BSs and learning the positional distribution simultaneously, and developing such a tool is the goal of this work As this tool is capable of modeling the positional preference of TFBSs using a discriminative learning principle, we call it Dispom, a tool for discriminative de-novo position distribution and motif discovery. We compare the motif found by Dispom with the canonical auxinresponsive element and test how specific these motifs are at predicting auxin-responsive genes for an independent data set

Materials and Methods

Findings

Conclusions