Abstract

Unlike structures determined by X-ray crystallography, which are deposited in the Brookhaven Protein Data Bank (Abola et al., 1987) as a single structure, each NMR-derived structure is often deposited as an ensemble containing many structures, each consistent with the restraint set used. However, there is often a need to select a single 'representative' structure, or a 'representative' subset of structures, from such an ensemble. This is useful, for example, in the case of homology modelling or when compiling a relational database of protein structures. It has been shown that cluster analysis, based on overall fold, followed by selection of the structure closest to the centroid of the largest cluster, is likely to identify a structure more representative of the ensemble than the commonly used minimized average structure (Sutcliffe, 1993). Two approaches to the problem of clustering ensembles of NMR-derived structures have been described. One of these (Adzhubei et al., 1995) performs the pairwise superposition of all structures using C a atoms to generate a set of r.m.s. distances. After cluster analysis based on these distances, a user-defined cut-off is required to determine the final membership of clusters and therefore the representative structures. The other approach (Diamond, 1995) uses collective superpositions and rigid-body transformations. Again, the position at which to draw a cut-off based on the particular clustering pattern was not addressed. Whenever fixed values are used for the cut-off in clustering, there is a danger of missing 'true' clusters under the threshold imposed by the rigid cut-off value. Considering the highly diverse nature of NMR-derived ensembles of proteins, it would seem most appropriate to avoid the use of predefined values for determining clusters. In fact, of the 302 ensembles we have studied, the average pairwise r.m.s. distance across an ensemble varied from 0.29 to 11.3 A (mean value 3.0, SD 1.9 A). Here we present an automated method for cut-off determination that avoids the dangers of using fixed values for this purpose. We have developed a computer program that automatically, systematically and rapidly (i) clusters an ensemble of structures into a set of conformationally related subfamilies, and (ii) selects a representative structure from each cluster. The program uses the method of average linkage to define how clusters are built up, followed by the application of a penalty function that seeks to minimize simultaneously the number of clusters

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call