Abstract

Multiple sequence alignment represents a class of powerful bioinformatics tools with many uses in computational biology ranging from discovery of characteristic motifs and conserved regions in protein families to improved prediction of secondary and tertiary structure. Today, with rapidly growing data repositories offering scientists significantly more data with which to make better decisions, it is increasingly important to run these multiple alignment calculations as rapidly as possible. However, while several multiple alignment algorithms have been developed, these algorithms remain computationally expensive taking as long as 2 to 3 days for some queries. In this paper, we propose a new caching technique to improve the performance of multiple sequence alignment algorithms. In particular, we propose a nested two level cache hierarchy that provides caching of pairwise alignment results - a computationally expensive subcomponent of the multiple sequence alignment algorithms. A key contribution of our work is the development of two novel cache replacement policies that closely track the scientist’s query patterns over time. We present experimental results that validate the benefits of caching over the repeated computation of the alignments, provide heuristics for determining which alignments would benefit from the caching, and show the effectiveness of the developed cache replacement policies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call