Abstract

BackgroundDNA sequences contain repetitive motifs which have various functions in the physiology of the organism. A number of methods have been developed for discovery of such sequence motifs with a primary focus on detection of regulatory motifs and particularly transcription factor binding sites. Most motif-finding methods apply probabilistic models to detect motifs characterized by unusually high number of copies of the motif in the analyzed sequences.ResultsWe present a novel method for detection of pairs of motifs separated by spacers of variable nucleotide sequence but conserved length. Unlike existing methods for motif discovery, the motifs themselves are not required to occur at unusually high frequency but only to exhibit a significant preference to occur at a specific distance from each other. In the present implementation of the method, motifs are represented by pentamers and all pairs of pentamers are evaluated for statistically significant preference for a specific distance. An important step of the algorithm eliminates motif pairs where the spacers separating the two motifs exhibit a high degree of sequence similarity; such motif pairs likely arise from duplications of the whole segment including the motifs and the spacer rather than due to selective constraints indicative of a functional importance of the motif pair. The method was used to scan 569 complete prokaryotic genomes for novel sequence motifs. Some motifs detected were previously known but other motifs found in the search appear to be novel. Selected motif pairs were subjected to further investigation and in some cases their possible biological functions were proposed.ConclusionsWe present a new motif-finding technique that is applicable to scanning complete genomes for sequence motifs. The results from analysis of 569 genomes suggest that the method detects previously known motifs that are expected to be found as well as new motifs that are unlikely to be discovered by traditional motif-finding methods. We conclude that our approach to detection of significant motif pairs can complement existing motif-finding techniques in discovery of novel functional sequence motifs in complete genomes.

Highlights

  • DNA sequences contain repetitive motifs which have various functions in the physiology of the organism

  • Ab initio detection of candidate functional sequence motifs in complete genomes traditionally relies on wordcounting approaches, which follow from a reasoning that selective constraints on a functional sequence motif could lead to statistically significant excess of the motif occurrences in the genome

  • We present a novel motif-finding method based on detection of pairs of sequence motifs with statistically significant preference for a specific distance from each other

Read more

Summary

Introduction

DNA sequences contain repetitive motifs which have various functions in the physiology of the organism. A number of methods have been developed for discovery of such sequence motifs with a primary focus on detection of regulatory motifs and transcription factor binding sites. There are two major types of motif finding algorithms, namely supervised and unsupervised motif finding algorithms The former methods require a sample of known occurrences of the motif and utilize this information in the search for additional motif occurrences in the analyzed sequence or sequences. Based on the type of DNA sequence information used by the TFBS finding algorithm, the methods could be classified into three major classes: 1) methods that use promoter sequences from coregulated genes from a single genome [6, 7], 2) methods that use orthologous promoter sequences of a single gene from multiple species [8,9,10] and 3) methods combining 1) and 2) [11, 12]. As a unified portal for online discovery and analysis of sequence motifs, the MEME Suite web server provides various tools in finding motifs representing features such as DNA binding sites and protein interaction domains [13]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call