Abstract

MotivationTranscription factors play a crucial role in gene regulation by binding to specific regulatory sequences. The sequence motifs recognized by a transcription factor can be described in terms of position frequency matrices. When scanning a sequence for matches to a position frequency matrix, one needs to determine a cut-off, which then in turn results in a certain number of hits. In this paper we describe how to compute the distribution of match scores and of the number of motif hits, which are the prerequisites to perform motif hit enrichment analysis.ResultsWe put forward an improved compound Poisson model that supports general order-d Markov background models and which computes the number of motif-hits more accurately than earlier models. We compared the accuracy of the improved compound Poisson model with previously proposed models across a range of parameters and motifs, demonstrating the improvement. The importance of the order-d model is supported in a case study using CpG-island sequences.Availability and implementationThe method is available as a Bioconductor package named ’motifcounter’ https://bioconductor.org/packages/motifcounter.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

  • IntroductionTranscription factors (TFs) play an essential role in the regulation of gene expression

  • Transcription factors (TFs) play an essential role in the regulation of gene expression. They function by binding to short sequences known as transcription factor binding sites (TFBSs) which are typically located in promoter or enhancer regions (Alberts et al, 2002)

  • We presented an improved compound Poisson model based on Pape et al (2008)

Read more

Summary

Introduction

Transcription factors (TFs) play an essential role in the regulation of gene expression They function by binding to short sequences known as transcription factor binding sites (TFBSs) which are typically located in promoter or enhancer regions (Alberts et al, 2002). Since the motifs typically lack specificity, the need arises to determine the statistical significance of a motif match and to delineate how many matches of a motif one would expect to find in a sequence by chance Relative to this information, TFBSs enrichment can subsequently be inferred for the sequences of interest, e.g. a set of promoters (Pape et al, 2008; Thomas-Chollier et al, 2008). These program are at the core of the motif enrichment approach, where a set of sequences is scanned for motifs which in those sequences are found more often than expected by chance (Frith et al, 2004; McLeay and Bailey, 2010; Roider et al, 2009; Zambelli et al, 2009)

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call