Discovery of Regulatory Elements is Improved by a Discriminatory Approach

Eivind Valen,Ole Winther,Albin Sandelin,Anders Krogh

doi:10.1371/journal.pcbi.1000562

Eivind Valen, Ole Winther + Show 2 more

Open Access

PDF Available

https://doi.org/10.1371/journal.pcbi.1000562

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

A major goal in post-genome biology is the complete mapping of the gene regulatory networks for every organism. Identification of regulatory elements is a prerequisite for realizing this ambitious goal. A common problem is finding regulatory patterns in promoters of a group of co-expressed genes, but contemporary methods are challenged by the size and diversity of regulatory regions in higher metazoans. Two key issues are the small amount of information contained in a pattern compared to the large promoter regions and the repetitive characteristics of genomic DNA, which both lead to “pattern drowning”. We present a new computational method for identifying transcription factor binding sites in promoters using a discriminatory approach with a large negative set encompassing a significant sample of the promoters from the relevant genome. The sequences are described by a probabilistic model and the most discriminatory motifs are identified by maximizing the probability of the sets given the motif model and prior probabilities of motif occurrences in both sets. Due to the large number of promoters in the negative set, an enhanced suffix array is used to improve speed and performance. Using our method, we demonstrate higher accuracy than the best of contemporary methods, high robustness when extending the length of the input sequences and a strong correlation between our objective function and the correct solution. Using a large background set of real promoters instead of a simplified model leads to higher discriminatory power and markedly reduces the need for repeat masking; a common pre-processing step for other pattern finders.

Highlights

The rapid emergence of experimental techniques that can probe for functional elements at whole-genome scales[1] necessitates computational methods to analyze data in these settings
Instead of simplifying the underlying DNA sequence by a general model, we take this to its extreme conclusion and use a very large set of promoters as the actual background instead of building a model describing the sequences in the promoters
We use the term ‘‘negative set’’ to describe the background set; this is strictly speaking not true as sites could occur in this set at a much lower frequency, since real promoters are sampled randomly

Summary

Introduction

The rapid emergence of experimental techniques that can probe for functional elements at whole-genome scales[1] necessitates computational methods to analyze data in these settings. Since the binding preferences of transcription factors (TFs) are not captured by a single word or consensus string, pattern-based approaches can give solutions closer to the biological reality and it has been argued that the matrix score is related to the binding energy [7,8]. Such approaches correspond to the problem of finding local, optimal multiple alignments, which is NP-complete [9]. Almost all pattern-based motif finders use statistical optimization methods such as Gibbs sampling or expectation maximization [10,11]

Methods

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS Computational Biology	Publication Date: Nov 13, 2009
Citations: 27	License type: CC BY 4.0

R Discovery Prime

Discovery of Regulatory Elements is Improved by a Discriminatory Approach

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: PLoS Computational Biology

Lead the way for us

Similar Papers

Inhibition of colony-stimulating factor-1 promoter activity by the product of the Wilms' tumor locus.
M.A Harrington ... F J Rauscher
Journal of Biological Chemistry | VOL. 268
M.A Harrington, et. al.M.A Harrington ... F J Rauscher
01 Oct 1993
Journal of Biological Chemistry | VOL. 268

Gonadotropin-releasing hormone receptor: cloning, expression and transcriptional regulation
Sham S Kakar ...
Progress in Brain Research | VOL. 141
Sham S Kakar, et. al.Sham S Kakar ...
01 Jan 2002
Progress in Brain Research | VOL. 141

Data-driven analysis of variables and dependencies in continuous optimization problems and estimation of distribution algorithms
Krishna Mishra
-
Krishna MishraKrishna Mishra
24 Apr 2015
24 Apr 2015

A derivative-free affine scaling trust region methods based on probabilistic models with new nonmonotone line search technique for linear inequality constrained minimization without strict complementarity
Peng Wang ... Detong Zhu
International Journal of Computer Mathematics | VOL. 96
Peng Wang, et. al.Peng Wang ... Detong Zhu
30 Sep 2018
International Journal of Computer Mathematics | VOL. 96

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Discovery of Regulatory Elements is Improved by a Discriminatory Approach

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: PLoS Computational Biology