Abstract
BackgroundA structured motif allows variable length gaps between several components, where each component is a simple motif, which allows either no gaps or only fixed length gaps. The motif can either be represented as a pattern or a profile (also called positional weight matrix). We propose an efficient algorithm, called SMOTIF, to solve the structured motif search problem, i.e., given one or more sequences and a structured motif, SMOTIF searches the sequences for all occurrences of the motif. Potential applications include searching for long terminal repeat (LTR) retrotransposons and composite regulatory binding sites in DNA sequences.ResultsSMOTIF can search for both pattern and profile motifs, and it is efficient in terms of both time and space; it outperforms SMARTFINDER, a state-of-the-art algorithm for structured motif search. Experimental results show that SMOTIF is about 7 times faster and consumes 100 times less memory than SMARTFINDER. It can effectively search for LTR retrotransposons and is well suited to searching for motifs with long range gaps. It is also successful in finding potential composite transcription factor binding sites.ConclusionSMOTIF is a useful and efficient tool in searching for structured pattern and profile motifs. The algorithm is available as open-source at: .
Highlights
A structured motif allows variable length gaps between several components, where each component is a simple motif, which allows either no gaps or only fixed length gaps
In the genome of Saccharomyces cerevisiae, the binding sites of transcription factor, GAL4 [1], can be characterized by the simple motif shown in Table 1, which illustrates the pattern over the IUPAC alphabet (ΣIUPAC; see Table 2), as well as its profile
We did the experiments on an Apple G5 with dual 2.7Ghz processors and 4GB memory running Mac OS X; the timings reported are total times for all steps of the algorithms; all our experiments use exact matching for the simple motifs, and we report the full position for the occurrences
Summary
A structured motif allows variable length gaps between several components, where each component is a simple motif, which allows either no gaps or only fixed length gaps. The motif can either be represented as a pattern or a profile ( called positional weight matrix). Potential applications include searching for long terminal repeat (LTR) retrotransposons and composite regulatory binding sites in DNA sequences. Motifs can be represented as either patterns over a specific alphabet, or profiles ( called positional weight matrix (PWM)), which give the probability of observing each symbol in each position. If no variable gaps are allowed in the motif, it is called a simple motif. In the genome of Saccharomyces cerevisiae, the binding sites of transcription factor, GAL4 [1], can be characterized by the simple motif shown, which illustrates the pattern over the IUPAC alphabet (ΣIUPAC; see Table 2), as well as its profile (which gives the frequency of each DNA base at each position).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.