Searching Protein Three-Dimensional Structures in Faster Than Linear Time

Tetsuo Shibuya

doi:10.1089/cmb.2009.0217

Tetsuo Shibuya

https://doi.org/10.1089/cmb.2009.0217

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Searching for similar structures from a three-dimensional (3-D) structure database of proteins is one of the most important problems in post-genomic computational biology. To compare two structures, we ordinarily use a measure called the root mean square deviation (RMSD) as the similarity measure. We consider a very fundamental problem of finding all the substructures whose RMSDs to the query are within some given threshold, from a 3-D structure database. The problem also appears in many other fields, such as computer vision and robotics. In this article, we propose the first algorithm that runs in faster than linear time on average. Our new algorithm runs in average-case O(m + N/m(1-epsilon)), where N is the database size, m is the query length, and epsilon is an arbitrary small constant such that 0 < epsilon < 1. It is a significant improvement over previous algorithms on the problem, considering that the best known worst-case time complexity of the problem is O(N log m), and the best known average-case (expected) time complexity of the problem was O(N).

Full Text