Real time structural search of the Protein Data Bank.

Dmytro Guzenko,Stephen K Burley,Jose M Duarte,Charlotte M Deane

doi:10.1371/journal.pcbi.1007970

Abstract

Detection of protein structure similarity is a central challenge in structural bioinformatics. Comparisons are usually performed at the polypeptide chain level, however the functional form of a protein within the cell is often an oligomer. This fact, together with recent growth of oligomeric structures in the Protein Data Bank (PDB), demands more efficient approaches to oligomeric assembly alignment/retrieval. Traditional methods use atom level information, which can be complicated by the presence of topological permutations within a polypeptide chain and/or subunit rearrangements. These challenges can be overcome by comparing electron density volumes directly. But, brute force alignment of 3D data is a compute intensive search problem. We developed a 3D Zernike moment normalization procedure to orient electron density volumes and assess similarity with unprecedented speed. Similarity searching with this approach enables real-time retrieval of proteins/protein assemblies resembling a target, from PDB or user input, together with resulting alignments (http://shape.rcsb.org).

Highlights

Structure similarity searching within the growing Protein Data Bank (PDB) archive [1, 2] revolutionized our understanding of protein evolution [3, 4, 5]
A volumetric function is amenable to be decomposed with a mathematical tool known as 3D Zernike polynomials, resulting in a compact description as vectors of Zernike moments
Structure similarity searching within the growing PDB archive [1, 2] revolutionized our understanding of protein evolution [3, 4, 5]

Summary

Introduction

Structure similarity searching within the growing PDB archive [1, 2] revolutionized our understanding of protein evolution [3, 4, 5]. With more than 150,000 publicly-available PDB structures, efficient methods for detecting and quantifying protein structure similarity are essential. Structure superposition tools were initially developed in the 1970s [6, 7] and the first algorithms for general structural alignment came in the 1990s [8, 9, 10], with more advanced methods appearing over the following decade [11, 12, 13, 14]. As the PDB grew, efficient searching of the entire archive became both important and difficult. These methods have focused on the task of aligning single polypeptide chains or parts thereof

Methods

Results

Discussion

Conclusion