Abstract

The rapid increase in available protein structure datasets requires new techniques for fast, yet, effective analysis of protein 3D structures. In this work, we propose a structure-based signature for protein families, suitable for rapid analysis of multidomain protein structures. Our method is alignment-free, using protein strings as the basic representation. A key novelty is the two-stage approach, whereby an initial list of candidate protein superfamilies are rapidly identified using the protein family signature, and then information retrieval methods are applied only to the members of the candidate superfamilies. This approach is the key to both improved speed, and improved structure retrieval accuracy. Experimental results, including comparative results with state-of-the-art methods, demonstrate the performance of the proposed protein family signature on queries with multidomain protein structures.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call