Abstract

BackgroundClassically, models of DNA-transcription factor binding sites (TFBSs) have been based on relatively few known instances and have treated them as sites of fixed length using position weight matrices (PWMs). Various extensions to this model have been proposed, most of which take account of dependencies between the bases in the binding sites. However, some transcription factors are known to exhibit some flexibility and bind to DNA in more than one possible physical configuration. In some cases this variation is known to affect the function of binding sites. With the increasing volume of ChIP-seq data available it is now possible to investigate models that incorporate this flexibility. Previous work on variable length models has been constrained by: a focus on specific zinc finger proteins in yeast using restrictive models; a reliance on hand-crafted models for just one transcription factor at a time; and a lack of evaluation on realistically sized data sets.ResultsWe re-analysed binding sites from the TRANSFAC database and found motivating examples where our new variable length model provides a better fit. We analysed several ChIP-seq data sets with a novel motif search algorithm and compared the results to one of the best standard PWM finders and a recently developed alternative method for finding motifs of variable structure. All the methods performed comparably in held-out cross validation tests. Known motifs of variable structure were recovered for p53, Stat5a and Stat5b. In addition our method recovered a novel generalised version of an existing PWM for Sp1 that allows for variable length binding. This motif improved classification performance.ConclusionsWe have presented a new gapped PWM model for variable length DNA binding sites that is not too restrictive nor over-parameterised. Our comparison with existing tools shows that on average it does not have better predictive accuracy than existing methods. However, it does provide more interpretable models of motifs of variable structure that are suitable for follow-up structural studies. To our knowledge, we are the first to apply variable length motif models to eukaryotic ChIP-seq data sets and consequently the first to show their value in this domain. The results include a novel motif for the ubiquitous transcription factor Sp1.

Highlights

  • Introduction to Computational Biology Chapman andHall, London 1995, chap. 2.30

  • In the Results section we present an analysis of the binding sites in the TRANSFAC database and a comparison of our method to several others: MEME, one of the most successful and popular standard motif finders; GLAM2, the best variable length motif finder known to us; and our own method but with the possibility of a gap switched off

  • These are in addition to the 10 position weight matrices (PWMs) where gaps had already been introduced in the sequences to define the PWMs published in TRANSFAC

Read more

Summary

Introduction

Introduction to Computational Biology Chapman andHall, London 1995, chap. 2.30. Sharon E, Lubliner S, Segal E: A feature-based approach to modeling protein-DNA interactions. Models of DNA-transcription factor binding sites (TFBSs) have been based on relatively few known instances and have treated them as sites of fixed length using position weight matrices (PWMs). We are a long way from determining the binding sites of all transcription factors in all conditions Until we have this experimental data, mathematical models of binding sites will help us predict TFBSs and in turn help us infer regulatory effects. These models may reveal combined binding sites of a transcription factor and its co-factors [1] and can be used to identify binding sites in species for which experimental binding data is not available Such models can explain variation in binding affinities [2,3] that can have a functional effect. Building such models is a crucial task in current bioinformatics research

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call