Abstract

BackgroundIdentification of cis- and trans-acting factors regulating gene expression remains an important problem in biology. Bioinformatics analyses of regulatory regions are hampered by several difficulties. One is that binding sites for regulatory proteins are often not significantly over-represented in the set of DNA sequences of interest, because of high levels of false positive predictions, and because of positional restrictions on functional binding sites with regard to the transcription start site.ResultsWe have developed a novel method for the detection of regulatory motifs based on their local over-representation in sets of regulatory regions. The method makes use of a Parzen window-based approach for scoring local enrichment, and during evaluation of significance it takes into account GC content of sequences. We show that the accuracy of our method compares favourably to that of other methods, and that our method is capable of detecting not only generally over-represented regulatory motifs, but also locally over-represented motifs that are often missed by standard motif detection approaches. Using a number of examples we illustrate the validity of our approach and suggest applications, such as the analysis of weaker binding sites.ConclusionsOur approach can be used to suggest testable hypotheses for wet-lab experiments. It has potential for future analyses, such as the prediction of weaker binding sites. An online application of our approach, called LocaMo Finder (Local Motif Finder), is available at http://sysimm.ifrec.osaka-u.ac.jp/tfbs/locamo/.

Highlights

  • Identification of cis- and trans-acting factors regulating gene expression remains an important problem in biology

  • Parzen widow approaches have been used in bioinformatics for ChIP-seq peak calling [23], but to our best knowledge it has never been used for the analysis of regulatory motifs

  • Samples consist of predicted Transcription factor binding site (TFBS) in promoter sequences of co-expressed genes, and as a window function we use a Gaussian function of the distance to each TFBS

Read more

Summary

Introduction

Identification of cis- and trans-acting factors regulating gene expression remains an important problem in biology. Regulation of transcription in eukaryote cells is controlled by the binding of transcription factors (TFs) to specific binding sites in the regulatory regions of their target genes. One of the many difficulties faced by TFBS detection approaches is that some TFBSs are restricted in their location with regard to the transcription start site (TSS). Computational analyses usually use sequences of a fixed length (for eukaryotes typically 1000 bps or longer). In such cases, the region in which genuine regulatory motifs are positioned is small compared to the input sequence length, which makes position-restricted TFBSs hard to detect using standard approaches

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call