Abstract

BackgroundMany DNA regulatory elements occur as multiple instances within a target promoter. Gibbs sampling programs for finding DNA regulatory elements de novo can be prohibitively slow in locating all instances of such an element in a sequence set.ResultsWe describe an improvement to the A-GLAM computer program, which predicts regulatory elements within DNA sequences with Gibbs sampling. The improvement adds an optional "scanning step" after Gibbs sampling. Gibbs sampling produces a position specific scoring matrix (PSSM). The new scanning step resembles an iterative PSI-BLAST search based on the PSSM. First, it assigns an "individual score" to each subsequence of appropriate length within the input sequences using the initial PSSM. Second, it computes an E-value from each individual score, to assess the agreement between the corresponding subsequence and the PSSM. Third, it permits subsequences with E-values falling below a threshold to contribute to the underlying PSSM, which is then updated using the Bayesian calculus. A-GLAM iterates its scanning step to convergence, at which point no new subsequences contribute to the PSSM. After convergence, A-GLAM reports predicted regulatory elements within each sequence in order of increasing E-values, so users have a statistical evaluation of the predicted elements in a convenient presentation. Thus, although the Gibbs sampling step in A-GLAM finds at most one regulatory element per input sequence, the scanning step can now rapidly locate further instances of the element in each sequence.ConclusionDatasets from experiments determining the binding sites of transcription factors were used to evaluate the improvement to A-GLAM. Typically, the datasets included several sequences containing multiple instances of a regulatory motif. The improvements to A-GLAM permitted it to predict the multiple instances.

Highlights

  • Many DNA regulatory elements occur as multiple instances within a target promoter

  • Anchored Gapless Local Alignment of Multiple Sequences (A-GLAM) can start from a set of "seeds", e.g., statistically significant positions from word enumeration, to maximize the log-odds score over all possible gapless alignments containing the seeds

  • Prediction performance of A-GLAM A-GLAM's predictions of transcription factor binding sites were evaluated with reference sets containing known functional sites

Read more

Summary

Introduction

Many DNA regulatory elements occur as multiple instances within a target promoter. Gibbs sampling programs for finding DNA regulatory elements de novo can be prohibitively slow in locating all instances of such an element in a sequence set. Combinatorial gene regulation is a major factor in evolution, because it helps coordinate diverse novel phenotypic features in a new species. Because it often reflects chemical synergies between tran-. Our previous work [16] produced the A-GLAM computer program, which combines word enumeration with probabilistic sequence models to identify cis-regulatory sequences in human promoters, as follows. Given any gapless subsequence alignment, probabilistic sequence models yield a marginal Bayesian log-odds score. A-GLAM can start from a set of "seeds", e.g., statistically significant positions from word enumeration, to maximize the log-odds score over all possible gapless alignments containing the seeds

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call