Abstract
Searching for similar structures from a three-dimensional (3-D) structure database of proteins is one of the most important problems in post-genomic computational biology. To compare two structures, we ordinarily use a measure called the root mean square deviation (RMSD) as the similarity measure. We consider a very fundamental problem of finding all the substructures whose RMSDs to the query are within some given threshold, from a 3-D structure database. The problem also appears in many other fields, such as computer vision and robotics. In this article, we propose the first algorithm that runs in faster than linear time on average. Our new algorithm runs in average-case O(m + N/m(1-epsilon)), where N is the database size, m is the query length, and epsilon is an arbitrary small constant such that 0 < epsilon < 1. It is a significant improvement over previous algorithms on the problem, considering that the best known worst-case time complexity of the problem is O(N log m), and the best known average-case (expected) time complexity of the problem was O(N).
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have