Abstract

CavBase is a database containing information about the three-dimensional geometry and the physicochemical properties of putative protein binding sites. Analyzing CavBase data typically involves computing the similarity of pairs of binding sites. In contrast to sequence alignment, however, a structural comparison of protein binding sites is a computationally challenging problem, making large scale studies difficult or even infeasible. One possibility to overcome this obstacle is to precompute pairwise similarities in an all-against-all comparison, and to make these similarities subsequently accessible to data analysis methods. Pairwise similarities, once being computed, can also be used to equip CavBase with a neighborhood structure. Taking advantage of this structure, methods for problems such as similarity retrieval can be implemented efficiently. In this paper, we tackle the problem of performing an all-against-all comparison using CavBase, consisting of more than 200,000 protein cavities, by means of parallel computation and cloud computing techniques. We present the conceptual design and technical realization of a large-scale study to create a similarity database called CavSimBase. We illustrate how CavSimBase is constructed, is accessed, and is used to answer biological questions by data analysis and similarity retrieval.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call