Abstract

BackgroundThis paper addresses the problem of recognising DNA cis-regulatory modules which are located far from genes. Experimental procedures for this are slow and costly, and computational methods are hard, because they lack positional information.ResultsWe present a novel statistical method, the "fluffy-tail test", to recognise regulatory DNA. We exploit one of the basic informational properties of regulatory DNA: abundance of over-represented transcription factor binding site (TFBS) motifs, although we do not look for specific TFBS motifs, per se . Though overrepresentation of TFBS motifs in regulatory DNA has been intensively exploited by many algorithms, it is still a difficult problem to distinguish regulatory from other genomic DNA.ConclusionWe show that, in the data used, our method is able to distinguish cis-regulatory modules by exploiting statistical differences between the probability distributions of similar words in regulatory and other DNA. The potential application of our method includes annotation of new genomic sequences and motif discovery.

Highlights

  • This paper addresses the problem of recognising DNA cis-regulatory modules which are located far from genes

  • Methods for recognising regulatory DNA may be divided into the following approaches: 1. Recognition of regulatory DNA regions based on description of known transcription factor binding sites (TFBS)

  • We examine the relationship between the maximal similar word list (MSWL) and predicted TFBS sites

Read more

Summary

Introduction

This paper addresses the problem of recognising DNA cis-regulatory modules which are located far from genes. The identification of regulatory regions, which are generally composed of dense clusters of target transcription factor binding sites, forms an essential step in understanding the regulatory interactions that govern the spatial and temporal expression of individual genes (see for example [1,2]) and genetic regulatory networks, (see for example [3]). This task is accomplished experimentally using techniques such as empirical deletion analysis, direct binding measurements, and co-precipitation of proteinDNA complexes. Experimental verification is (page number not for citation purposes)

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call