Abstract

An updated version of the 3D_ali databank (Pascarella and Argos, 1992) was constructed to incorporate new protein structural and sequence data acquired since the original release in 1992. The databank has proved useful in many research fields, such as protein sequence and structure analysis and comparison, protein folding, engineering and design, evolution, and the like. The collection enhances present protein structural knowledge by merging information from proteins having a similar main-chain fold with homologous primary structures taken from large databases of known sequences. However, the construction philosophy of the databank has been modified. Originally, the Protein Data Bank (PDB; Bernstein et al, 1977) of known 3-D structures was exhaustively scanned for fold redundancy, and all the possible unique structures were incorporated either in multiple structural alignment files or in those containing only one structure where no relatives could be detected. The tertiary structural superpositioning of the backbones, which yielded spatial and topological Co atom equivalencing and thus corresponding sequence alignments, was mostly taken from the literature, but was sometimes determined by the authors using the Rossman-Argos superposition technique (Argos and Rossmann, 1979). In the updated 3D_ali databank, only published alignments based on superpositioning by the authors of the tertiary structures were collected and only folds with more than one sample structure were considered. Different literature alignments were also merged if they included common folds. As in the former release, only full coordinate sets with assigned side chains were included, while NMR structures, excluded in the 1992 release, are now incorporated. The 3D_ali features have several advantages. No systematic error, as could result from automatic topological equivalence methods, is introduced in the structural alignments because they follow the published literature and are derived by the authors of the structures themselves, who carefully considered hydrogen bond patterns, functional features, loop alterations, dihedral angles, structural peculiarities and the like. Subsequences such as those in loops that cannot be reliably aligned in 3-D are not matched. Only folded and functional domains in proper N- to C-terminal order, and not substructural segments comprising a few helices and strands, are considered. Related sequences without known architecture are added only when the residue identity level assures accuracy in alignment to

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call