Abstract

Cellular regulation mechanisms that involve proteins and other active molecules interacting with specific targets often involve the recognition of sequence patterns. Short sequence elements on DNA, RNA and proteins play a central role in mediating such molecular recognition events. Studies that focus on measuring and investigating sequence-based recognition processes make use of statistical and computational tools that support the identification and understanding of sequence motifs. We present a new web application, named DRIMust, freely accessible through the website http://drimust.technion.ac.il for de novo motif discovery services. The DRIMust algorithm is based on the minimum hypergeometric statistical framework and uses suffix trees for an efficient enumeration of motif candidates. DRIMust takes as input ranked lists of sequences in FASTA format and returns motifs that are over-represented at the top of the list, where the determination of the threshold that defines top is data driven. The resulting motifs are presented individually with an accurate P-value indication and as a Position Specific Scoring Matrix. Comparing DRIMust with other state-of-the-art tools demonstrated significant advantage to DRIMust, both in result accuracy and in short running times. Overall, DRIMust is unique in combining efficient search on large ranked lists with rigorous P-value assessment for the detected motifs.

Highlights

  • The study of sequence elements that enable molecular recognition in a variety of cellular processes is an important component in improving our understanding of regulation in living cells

  • The occurrence of short binding motifs in RNA molecules plays a central role in enabling controlled regulation by RNA-binding proteins (RBPs) and by microRNAs

  • The main advantage of our method, implemented in DRIMust, is that it searches for enriched motifs in the entire ranked list and does not require defining a fixed set of sequences as in the case of other motif-search algorithms such as Multiple EM for Motif Elicitation (MEME) [14], PhyloGibbs [40] and others

Read more

Summary

Introduction

The study of sequence elements that enable molecular recognition in a variety of cellular processes is an important component in improving our understanding of regulation in living cells. DRIMust takes as input ranked lists of sequences in FASTA format and returns motifs that are over-represented at the top of the list, where the determination of the threshold that defines top is data driven.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call