Finding evolutionarily conserved cis-regulatory modules with a universal set of motifs

Bartek Wilczynski,Norbert Dojer,Jerzy Tiuryn,Mateusz Patelak

doi:10.1186/1471-2105-10-82

Bartek Wilczynski, Norbert Dojer + Show 2 more

Open Access

https://doi.org/10.1186/1471-2105-10-82

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Mar 10, 2009
Citations: 48	License type: cc-by

Affiliation: University of Warsaw

Abstract

BackgroundFinding functional regulatory elements in DNA sequences is a very important problem in computational biology and providing a reliable algorithm for this task would be a major step towards understanding regulatory mechanisms on genome-wide scale. Major obstacles in this respect are that the fact that the amount of non-coding DNA is vast, and that the methods for predicting functional transcription factor binding sites tend to produce results with a high percentage of false positives. This makes the problem of finding regions significantly enriched in binding sites difficult.ResultsWe develop a novel method for predicting regulatory regions in DNA sequences, which is designed to exploit the evolutionary conservation of regulatory elements between species without assuming that the order of motifs is preserved across species. We have implemented our method and tested its predictive abilities on various datasets from different organisms.ConclusionWe show that our approach enables us to find a majority of the known CRMs using only sequence information from different species together with currently publicly available motif data. Also, our method is robust enough to perform well in predicting CRMs, despite differences in tissue specificity and even across species, provided that the evolutionary distances between compared species do not change substantially. The complexity of the proposed algorithm is polynomial, and the observed running times show that it may be readily applied.

Highlights

Finding functional regulatory elements in DNA sequences is a very important problem in computational biology and providing a reliable algorithm for this task would be a major step towards understanding regulatory mechanisms on genome-wide scale
We show that our method gives reasonable results for other cis-regulatory module (CRM) with these parameters: liver specific CRMs in human and the CRMs for the eve gene in D. melanogaster
Parameter estimation – case study of muscle specific CRMs We estimated the appropriate parameters on a large set of muscle specific CRMs reported by Wasserman and

Summary

Introduction

Finding functional regulatory elements in DNA sequences is a very important problem in computational biology and providing a reliable algorithm for this task would be a major step towards understanding regulatory mechanisms on genome-wide scale. Major obstacles in this respect are that the fact that the amount of non-coding DNA is vast, and that the methods for predicting functional transcription factor binding sites tend to produce results with a high percentage of false positives.

Methods

Results

Discussion

Conclusion