Abstract

BackgroundTraditional methods for computational motif discovery often suffer from poor performance. In particular, methods that search for sequence matches to known binding motifs tend to predict many non-functional binding sites because they fail to take into consideration the biological state of the cell. In recent years, genome-wide studies have generated a lot of data that has the potential to improve our ability to identify functional motifs and binding sites, such as information about chromatin accessibility and epigenetic states in different cell types. However, it is not always trivial to make use of this data in combination with existing motif discovery tools, especially for researchers who are not skilled in bioinformatics programming.ResultsHere we present MotifLab, a general workbench for analysing regulatory sequence regions and discovering transcription factor binding sites and cis-regulatory modules. MotifLab supports comprehensive motif discovery and analysis by allowing users to integrate several popular motif discovery tools as well as different kinds of additional information, including phylogenetic conservation, epigenetic marks, DNase hypersensitive sites, ChIP-Seq data, positional binding preferences of transcription factors, transcription factor interactions and gene expression. MotifLab offers several data-processing operations that can be used to create, manipulate and analyse data objects, and complete analysis workflows can be constructed and automatically executed within MotifLab, including graphical presentation of the results.ConclusionsWe have developed MotifLab as a flexible workbench for motif analysis in a genomic context. The flexibility and effectiveness of this workbench has been demonstrated on selected test cases, in particular two previously published benchmark data sets for single motifs and modules, and a realistic example of genes responding to treatment with forskolin. MotifLab is freely available at http://www.motiflab.org.

Highlights

  • Traditional methods for computational motif discovery often suffer from poor performance

  • This section presents three examples of practical applications using MotifLab, which illustrate some benefits of incorporating additional information when analysing regulatory sequences

  • The features chosen as a basis for the priors track were: conservation, conserved peaks, DNase hypersensitive sites, general regions bound by transcription factors according to ChIP-Seq data, CpG-islands, gene regions, coding regions, repeat regions and regions with histone marks H3K4me1 and H3K4me3

Read more

Summary

Introduction

Traditional methods for computational motif discovery often suffer from poor performance. The motivation is, that one individual method might be mistaken in a single case, any motif predicted by several different methods is probably more likely to be correct Tools such as Melina [3] and Tmod [4] provide users the opportunity of running and comparing results for several methods within a unified interface, and ensemble methods, like EMD [5] and MotifVoter [6], can take predictions from multiple methods as input and automatically derive a consensus. The reason why motif discovery is so difficult in the first place is that binding motifs are often rather short and can vary substantially between binding sites This makes them hard to discover with de novo motif discovery methods since the signal-to-noise ratio can be quite low when searching for motifs embedded in long background sequences. Several module discovery methods have been proposed to search for such motif groups [7]

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call