Abstract

Motif search is a fundamental problem in bioinformatics with an important application in locating transcription factor binding sites (TFBSs) in DNA sequences. The exact algorithms can report all (l, d) motifs and find the best one under a specific objective function. However, it is still a challenging task to identify weak motifs, since either a large amount of memory or execution time is required by current exact algorithms. A new exact algorithm, PairMotif, is proposed for planted (l, d) motif search (PMS) in this paper. To effectively reduce both candidate motifs and scanned l-mers, multiple pairs of l-mers with relatively large distances are selected from input sequences to restrict the search space. Comparisons with several recently proposed algorithms show that PairMotif requires less storage space and runs faster on most PMS instances. Particularly, among the algorithms compared, only PairMotif can solve the weak instance (27, 9) within 10 hours. Moreover, the performance of PairMotif is stable over the sequence length, which allows it to identify motifs in longer sequences. For the real biological data, experimental results demonstrate the validity of the proposed algorithm.

Highlights

  • Motif search plays an important role in gene finding and gene regulation relationship understanding

  • That is because: 1) our study doesn’t make use of any human or vertebrate animal subjects and tissue; 2) our study focuses on faster algorithms for planted (l, d) motif search, which is a widely used computing model for DNA motif search, and our experiments are completed only by using computers

  • We mainly compare the time performance of PairMotif with that of other famous exact algorithms, since all exact algorithms report the same results with different time overheads

Read more

Summary

Introduction

Motif search plays an important role in gene finding and gene regulation relationship understanding. Vine [10], the recent method, is a polynomial-time heuristic algorithm for motif search based on WINNOWER [2]. According to the search space of PMS, there are two types of exact recognition algorithms. One is the exact algorithms based on alignment matrix, which test all (n 2 l +1)t possible combinations of motif positions in each of sequences to find the one that yields the highest score. PMS5 [22], whose main idea is to use integer programming to compute the common d-neighbors of three lmers, is an efficient algorithm for solving weak PMS instances with the value of l about 20. PMS5 is difficult to solve weak instances with large values of l, because of the substantial memory required for storing the results of all possible integer linear programs. The experimental results demonstrate the efficiency and effectiveness of the proposed algorithm

Methods
11: Output M
Results and Discussion
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.