Abstract

Before an electron density image of a macromolecule can be constructed based on the measured intensities of the X-ray waves diffracted by a crystal of the macromolecule, the phase for each wave has to be determined. In the last decade, the energy dependance of X-ray scattering has become one of the standard tools for accomplishing this feat. The multiple wavelength anomalous dispersion (MAD) technique [1Hendrickson W.A. Ogata C.M. Phase determination from multiwavelength anomalous diffraction measurements.Methods Enzymol. 1997; 276: 494-522Crossref PubMed Scopus (321) Google Scholar] has grown to become the method of choice for an ever-increasing number of structural biologists. In a recent review [2Hendrickson W.A. Maturation of MAD phasing for the determination of macromolecular structures.J. Synchr. Rad. 1999; 6: 845-851Crossref Scopus (76) Google Scholar] it was anticipated that the importance of MAD will grow even further. The reasons for this are manifold: (i) strong, tunable synchrotron beamlines and fast detectors are widely available nowadays, (ii) with modern cryocrystallography techniques, a complete diffraction data set can in most cases be collected from one crystal, (iii) phase determination by MAD does not suffer from the lack of isomorphism, as does the multiple isomorphous replacement method, and last but not least, (iv) modern molecular biology techniques make it possible to produce selenomethionine-containing proteins easily and in large quanitites. Nevertheless, in a recent study on seven MAD structure determinations of Se-Met proteins, it has been argued that the collection of the peak wavelength alone in combination with solvent flattening would have been sufficient for solving the structures [3Rice L.M. Earnest T.N. Brünger A.T. Single-wavelength anomalous diffraction phasing revisited.Acta Crystallogr. D. 2000; 56: 1413-1420Crossref PubMed Scopus (132) Google Scholar]. The single-wavelength anomalous scattering (SAS, or sometimes also called SAD) approach that was originally proposed by Wang [4Wang B.-C. Resolution of phase ambiguity in macromolecular crystallography.Methods Enzymol. 1985; 115: 90-112Crossref PubMed Scopus (929) Google Scholar] should therefore constitute a more time-efficient alternative to a complete MAD analysis. An inherent disadvantage of SAS is, however, that it is impossible to obtain a unimodal phase probability distribution. In order to resolve the resulting phase ambiguity, it is therefore necessary to combine the SAS approach with density modification or other techniques. In principle, the SAS method can also be employed to solve the structure of native proteins based on the anomalous signal provided by sulfur atoms and bound metal or other ions. Here, the signal is significantly weaker than what is usually encountered when the anomalous substructure contains heavy atoms. Even so, there have been some examples of successful structure determinations, the earliest one being the hydrophobic protein Crambin (Mr = 4.8 kDa, six sulfur atoms) [5Hendrickson W.A. Teeter M.M. Structure of the hydrophobic protein crambin determined directly from the anomalous scattering of sulphur.Nature. 1981; 290: 107-113Crossref PubMed Scopus (516) Google Scholar]. A more recent example is hen egg-white Lysozyme , for which Dauter et al. [6Dauter Z. Dauter M. de la Fortelle E. Bricogne G. Sheldrick G.M. Can anomalous signal of sulfur become a tool for solving protein structures?.J. Mol. Biol. 1999; 289: 83-92Crossref PubMed Scopus (207) Google Scholar] could show by using synchrotron radiation at that the anomalous scattering of ten sulfur atoms and eight chloride ions was sufficient for phase determination. This was subsequently shown to work with CuKα radiation from a rotating anode as well, provided that the data were collected at very high redundancy [7Weiss M.S. Global indicators of X-ray data quality.J. Appl. Crystallogr. 2001; 34: 130-135Crossref Scopus (546) Google Scholar]. The 22.3 kDa protein Obelin, with eight sulfur atoms and one bound chloride ion, constitutes another recent example. In this case, the authors enhanced the anomalous signal by collecting the diffraction data at a wavelength of [8Liu Z.-J. Vysotski E.S. Chen C.-J. Rose J.P. Lee J. Wang B.-C. Structure of the Ca2+-regulated photoprotein obelin at 1.7 Å resolution determined directly from its sulfur substructure.Protein Science. 2000; 9: 2085-2093Crossref PubMed Scopus (177) Google Scholar]. The anomalous signal of one Zn2+ and one Ca2+ at λ = 0.863 Å was used successfully for phasing the 11.4 kDa protein Psoriasin [9Brodersen D.E. de la Fortelle E. Vonrhein C. Bricogne G. Nyborg J. Kjeldgaard M. Applications of single-wavelength anomalous dispersion at high and atomic resolution.Acta Crystallogr. D. 2000; 56: 431-441Crossref PubMed Scopus (62) Google Scholar], and the sulfur anomalous signal at CuKα radiation proved to be sufficient for phasing the macrocyclic antibiotic thiostrepton (Mr = 1.7 kDa, five sulfur atoms) [10Bond C.S. Shaw M.P. Alphey M.S. Hunter W.N. Structure of the macrocycle thiostrepton solved using the anomalous dispersion contribution of sulfur.Acta Crystallogr. D. 2001; 57: 755-758Crossref PubMed Scopus (49) Google Scholar]. In order to make the sulfur SAS approach generally applicable, it appears to be necessary to either increase the weak anomalous signal that is present, to increase the accuracy of data collection and processing, or both. Recently, we have carried out a pilot experiment at the X-ray diffration beamline of the ELETTRA synchrotron (Trieste, Italy) and were able to show that longer wavelengths can be routinely employed in macromolecular crystallography [11Weiss M.S. Sicker T. Djinovic-Carugo K. Hilgenfeld R. On the routine use of soft X-rays in macromolecular crystallography.Acta Crystallogr. D. 2001; 57: 689-695Crossref PubMed Scopus (60) Google Scholar] without having to make time-consuming changes to the beamline setup. This encouraged us to proceed and make use of the longer wavelengths (λ > 1.5 Å) to enhance the small anomalous signal present in native protein crystals. We can now demonstrate that the combination of using long wavelengths for data collection, of collecting highly accurate data, and of employing an optimized protocol for scaling the data is sufficient to phase a 35 kDa protein based on anomalous intensity differences as small as 1.3%. Furthermore, if the nominal resolution of the recorded diffraction data extends to about 2.0 Å or farther, the phases provided by our approach constitute a good starting point for automating the determination of the structure. Especially in the light of all the structural-genomics projects undertaken worldwide, this may have the potential to become one of the standard approaches for nearly completely automated protein structure determination. The procedure developed and described here is based on three major pillars: (i) the use of soft X-rays in the wavelength range (this corresponds to the energy range 8000–4000 eV) for diffraction data collection, (ii) the collection of highly redundant diffraction data, and (iii) a suitable scaling protocol. For the structure determination step, only standard computer software and more or less standard protocols are employed. The first part of the method described is the collection of diffraction data at longer wavelengths. Since the anomalous diffraction ratio ΔF/F is dependent on the wavelength used for data collection, one can adjust the magnitude of the anomalous signal by the choice of the wavelength. Based on the composition of a given protein, one can calculate a rough estimate of the expected ΔF/F for the protein by using a generalized version of the equation published by Hendrickson and Teeter [5Hendrickson W.A. Teeter M.M. Structure of the hydrophobic protein crambin determined directly from the anomalous scattering of sulphur.Nature. 1981; 290: 107-113Crossref PubMed Scopus (516) Google Scholar]: where both sums run over all atoms i in the structure, Ni is the number of atoms of type i, and Δfi″ and fi are the anomalous scattering factor and the atomic form factor, respectively, at zero scattering angle of an atom of type i. If the protein does not contain strongly anomalously scattering atoms, the anomalous differences that can be observed are very small. However, the weak anomalous signal present in the native protein crystal provided by P, S, Cl−, K+, Ca2+, etc. can be sufficiently increased by the choice of an appropriate wavelength for collecting the diffraction data, and a usable signal for phase determination can be obtained. The second integral part of the method is the collection of highly redundant diffraction data. Even at the longest wavelengths considered here, the anomalous signal is still small. It is therefore necessary to collect the data at very high redundancy (20 or higher for averaged intensities) so that very accurate anomalous differences and small errors in these differences are obtained. In a previous study it was demonstrated that the success rate of anomalous substructure determination increased as data redundancy increased [7Weiss M.S. Global indicators of X-ray data quality.J. Appl. Crystallogr. 2001; 34: 130-135Crossref Scopus (546) Google Scholar]. Our present working hypothesis is that the error or noise in the observed intensities as described by the precision-indicating merging R factor Rp.i.m. [7Weiss M.S. Global indicators of X-ray data quality.J. Appl. Crystallogr. 2001; 34: 130-135Crossref Scopus (546) Google Scholar] has to be significantly smaller than the observed signal as described by the anomalous R factor Ranom. With this rule of thumb, it should be possible to estimate the necessary overall redundancy after the first few diffraction images have been collected and processed. A potential problem concerning the collection of data at longer wavelengths is, however, the absorption effects encountered. It is therefore essential that one employ a suitable scaling protocol, such as the 3D-detector scaling model provided by the program SCALA of the CCP4 suite [12CCP4The CCP4 (Collaborative Computational Project 4) suite programs for protein crystallography.Acta Crystallogr. D. 1994; 50: 760-763Crossref PubMed Scopus (17715) Google Scholar], with which a pseudo-absorption correction is performed on the observed intensities. In a previous study on long-wavelength diffraction data collection [11Weiss M.S. Sicker T. Djinovic-Carugo K. Hilgenfeld R. On the routine use of soft X-rays in macromolecular crystallography.Acta Crystallogr. D. 2001; 57: 689-695Crossref PubMed Scopus (60) Google Scholar], we were able to demonstrate that indeed the “best” anomalous differences were obtained in this way. In summary, we will demonstrate here that by collecting highly redundant diffraction data at longer wavelengths and by properly scaling the integrated intensities, one can enhance the observed anomalous signal sufficiently so that the phasing power provided by the anomalous differences obtained in a SAS experiment is large enough to yield an interpretable electron density map. The experimental details of the outlined method will be discussed in the following paragraphs describing a test case. The metalloprotease Thermolysin provides a good example for testing the outlined method. It is commercially available, it can be easily and reproducibly crystallized from an aqueous DMSO solution [13Matthews B.W. Jansonius J.N. Colman P.M. Schoenborn B.P. Duporque D. Three-dimensional structure of thermolysin.Nature New Biology. 1972; 238: 37-41Crossref PubMed Scopus (304) Google Scholar, 14Hilgenfeld R. Liesum A. Storm R. Plaas-Link A. Crystallization of two bacterial enzymes on an unmanned space mission.J. Cryst. Growth. 1992; 122: 330-336Crossref Scopus (22) Google Scholar, 15Schiefner, A. (2000). Crystallographic studies on thermolysin. Diploma thesis, University of Jena, Jena, Germany.Google Scholar], and the resulting crystals are sufficiently well ordered and diffract to at least 1.7 Å resolution at a synchrotron source. Thermolysin, with a total of 316 amino acids (molecular weight 34.6 kDa), constitutes an average-sized protein, too large to be amenable for direct methods of structure determination but still not large enough to be out of range for the method described here. Crystalline Thermolysin contains a few anomalous scatterers that serve as the basis for our approach. These are the Zn2+ ion in the active site, up to five Ca2+ ions, and three sulfur atoms (Met120-Sδ, Met205-Sδ, and one DMSO molecule). A plot of the estimated ΔF/F values based on the equation given above versus the wavelength λ used for data collection is shown in Figure 1. The peak at the wavelength λ ≅ 1.28 Å is due to the absorption edge of the Zn2+ ion. Already at , the total anomalous signal becomes larger than the one close to the Zn edge owing to the increased contribution of the calcium ions and the sulfur atoms. Last but not least, Thermolysin crystallizes in the hexagonal space group P6122 with cell dimensions a = 92.8 Å and c = 129.8 Å, which makes the high-redundancy data collection fairly straightforward. Diffraction data were collected from three different crystals at a temperature of 100 K at the X-ray diffraction beamline of the ELETTRA synchrotron (Trieste, Italy) at wavelengths of 1.5 Å (crystal 1), 1.7 Å (crystal 2), 1.9 Å (crystal 3), 2.1 Å (crystal 2), and 2.64 Å (crystal 3). See also Table 1 for data collection parameters and data processing statistics. Prior to flash cooling, the crystals were soaked in 20% (v/v) glycerol for about 1 hr. As in the experiment carried out by Weiss et al. [11Weiss M.S. Sicker T. Djinovic-Carugo K. Hilgenfeld R. On the routine use of soft X-rays in macromolecular crystallography.Acta Crystallogr. D. 2001; 57: 689-695Crossref PubMed Scopus (60) Google Scholar], no changes were made to the beamline setup, except that the double gas-filled beam-position monitor at the end of the beamline was partially evacuated. At the longer wavelengths, it was also necessary to slightly misalign the second monochromator crystal in order to suppress the appearance of reflections originating from the third harmonic wavelength. Each data set consisted of a full revolution (360°) of data, which were collected as images of 0.5° each except for data set E (λ = 2.64 Å), for which 1.0° images were collected. For three data sets (A, B, and D), a 90° low-resolution pass was collected as well. It should be noted that only the first three data sets extend to the maximum resolution of 1.83 Å (Table 1). At the longer wavelengths, the minimum crystal-to-detector distance of about 40 mm for the MARCCD detector at the XRD beamline at ELETTRA prevented the collection of diffraction data out to this resolution.Table 1Data Collection and Processing StatisticsData SetABCDEData CollectionCrystal12323Wavelength [Å]1.501.701.902.102.64Crystal-film distance [mm]7055404040Data Processinga [Å]92.7292.7792.8592.8192.82c [Å]129.55129.73130.02129.68129.94Mosaicity [°]0.500.520.250.520.33Resolution limits [Å]99.0–1.8399.0–1.8399.0–1.8299.0–2.0199.0–2.52Outer shell [Å]1.88–1.831.88–1.831.87–1.822.06–2.012.59–2.52Total reflections1,200,3931,178,4971,114,069792,013412,049Unique reflections29,70629,76430,37522,66611,763Redundancy40.439.636.734.935.0Completeness [%]100.099.9100.0100.0100.0Outer shell [%]100.0100.0100.0100.0100.0I/σ(I)8.610.412.57.49.0Outer shell8.06.03.64.51.3Rmerge [%]1Rmerge = 100 ΣhklΣi | Ii(hkl) − <I(hkl)> | / ΣhklΣiIi(hkl)5.34.13.96.26.3Outer shell [%]8.110.517.412.827.0Rr.i.m. [%]2Rr.i.m. = 100 Σhkl (N/(N − 1))1/2Σi | Ii(hkl) − <I(hkl)> | / ΣhklΣiIi(hkl)5.44.24.06.36.4Outer shell [%]8.210.717.713.527.6Rp.i.m. [%]3Rp.i.m. = 100 Σhkl (1/(N − 1))1/2 Σi | Ii(hkl) − <I(hkl)> | / ΣhklΣiIi(hkl)0.80.60.61.01.0Outer shell [%]1.42.03.24.05.3Ranom [%]4Ranom = 100 Σhkl | I(hkl) − I(−h−k−I) | / Σhkl <I(hkl)>1.31.41.72.02.8Outer shell [%]2.02.73.84.15.41 Rmerge = 100 ΣhklΣi | Ii(hkl) − <I(hkl)> | / ΣhklΣiIi(hkl)2 Rr.i.m. = 100 Σhkl (N/(N − 1))1/2Σi | Ii(hkl) − <I(hkl)> | / ΣhklΣiIi(hkl)3 Rp.i.m. = 100 Σhkl (1/(N − 1))1/2 Σi | Ii(hkl) − <I(hkl)> | / ΣhklΣiIi(hkl)4 Ranom = 100 Σhkl | I(hkl) − I(−h−k−I) | / Σhkl <I(hkl)> Open table in a new tab In accordance with our previous experience with the processing of diffraction data collected at longer wavelengths [11Weiss M.S. Sicker T. Djinovic-Carugo K. Hilgenfeld R. On the routine use of soft X-rays in macromolecular crystallography.Acta Crystallogr. D. 2001; 57: 689-695Crossref PubMed Scopus (60) Google Scholar], we indexed and integrated the data by using the program DENZO [16Otwinowski Z. Minor W. Processing of X-ray diffraction data collected in oscillation mode.Methods Enzymol. 1997; 276: 307-326Crossref Scopus (36363) Google Scholar] and scaled the data by using SCALA [12CCP4The CCP4 (Collaborative Computational Project 4) suite programs for protein crystallography.Acta Crystallogr. D. 1994; 50: 760-763Crossref PubMed Scopus (17715) Google Scholar]. The scaling procedure, which was optimized to address the specific problems posed by the use of long wavelengths for diffraction experiments, consists of initial BATCH scaling followed by 3D-detector scaling. We calculated the redundancy-independent merging R factor Rr.i.m. as well as the precision-indicating merging R factor Rp.i.m. [7Weiss M.S. Global indicators of X-ray data quality.J. Appl. Crystallogr. 2001; 34: 130-135Crossref Scopus (546) Google Scholar] by using our own program, RMERGE (available from http://www.imb-jena.de/www_sbx/projects/ sbx_qual.html or from M.S.W. upon request). Structure factor amplitudes were then calculated with the program TRUNCATE [17French G.S. Wilson K.S. On the treatment of negative intensity observations.Acta Crystallogr. A. 1978; 34: 517-525Crossref Scopus (855) Google Scholar]. Data-processing and scaling statistics for the five data sets are shown in Table 1. From the merging statistics (Table 1) it can be inferred that the anomalous signal as described by Ranom increases as expected with increasing wavelength for data collection. However, Ranom has to be viewed relative to the overall precision of the averaged intensities as described by the precision-indicating merging R factor Rp.i.m. [7Weiss M.S. Global indicators of X-ray data quality.J. Appl. Crystallogr. 2001; 34: 130-135Crossref Scopus (546) Google Scholar]. If Ranom constitutes the signal and Rp.i.m. constitutes the noise, then the data set C exhibits the largest signal-to-noise ratio. The use of Ranom as a signal may not have a clear statistical basis because Ranom is not free of measurement errors. Nevertheless, it is deemed practical to do so because the whole substructure determination and phasing procedure relies entirely on the anomalous differences measured. Another approach to assessing the strength of the anomalous signal was the inspection of the anomalous Patterson function. In order to make the results from the different data sets comparable, we carried out the whole analysis at resolutions of 1.83 Å (data sets A–C), 2.01 and 2.20 Å (data sets A–D), and 2.52 Å (data sets A–E). The peak height of Ca-2 (the strongest calcium ion in the structure) was chosen as the measure. At all resolutions tried, data set C exhibited the maximum peak height for Ca-2, thus supporting the notion already gained from the merging statistics. Also, the anomalous-difference Fourier synthesis based on the observed anomalous differences and phases taken from a refined model was checked. As above, the analysis was carried out at four different resolutions. A Thermolysin model at 100 K refined to 1.8 Å resolution [15Schiefner, A. (2000). Crystallographic studies on thermolysin. Diploma thesis, University of Jena, Jena, Germany.Google Scholar] was used as a starting model for a standard refinement protocol against each data set. This protocol consisted of 20 cycles of rigid-body refinement and 20 cycles of standard restrained refinement with the program REFMAC [12CCP4The CCP4 (Collaborative Computational Project 4) suite programs for protein crystallography.Acta Crystallogr. D. 1994; 50: 760-763Crossref PubMed Scopus (17715) Google Scholar]. The model phases of the obtained structure were then used for calculating an anomalous-difference Fourier map. As judged from the peak height of the strongest calcium ion in the anomalous-difference Fourier map, data set C turned out to be the best one at all resolutions tried, again corroborating the results from data-processing statistics and from the inspection of the anomalous Patterson functions. The substructure of the anomalously scattering atoms was determined with the program Shake-and-Bake (SnB) version 2.0 [18Weeks C.M. Miller M. The design and implementation of SnB v 2.0.J. Appl. Crystallogr. 1999; 32: 120-124Crossref Scopus (379) Google Scholar]. E values were computed from the anomalous differences as described in [7Weiss M.S. Global indicators of X-ray data quality.J. Appl. Crystallogr. 2001; 34: 130-135Crossref Scopus (546) Google Scholar]. Then, using 3,000 reflections and 30,000 invariants, we searched for a total of ten sites of anomalously scattering atoms in 50 SnB cycles. Usually, 1,000 trials were conducted. For all five data sets and all four high-resolution limits tried, the search was successful. The highest success rate was observed for the data set C at the resolution limit of 2.52 Å with 85 successful trials out of 1,000. However, the discrimation between the correct solutions and the incorrect solutions was better when the substructure determination was carried out at higher resolution. A typical SnB-histogram is shown in Figure 2. It does not exhibit the expected bimodal distribution but rather exhibits a sort of trimodal distribution. Presumably, in a large number of trials, partially correct solutions were returned that scored much better than the completely wrong solutions. It is conceivable that these partial solutions could be transformed into complete solutions if larger numbers of SnB cycles were employed. Alternatively, one might achieve the completion of the substructure by inspecting residual electron density maps. All fifteen top peaks returned by SnB were submitted without manual editing to the program MLPHARE [12CCP4The CCP4 (Collaborative Computational Project 4) suite programs for protein crystallography.Acta Crystallogr. D. 1994; 50: 760-763Crossref PubMed Scopus (17715) Google Scholar] for parameter refinement. All atoms were treated as sulfur atoms, and only the anomalous occupancy was refined. This was shown to be sufficient in previous experiments with hen egg-white Lysozyme (our unpublished data). Again, data set C yielded the best statistics when RCullis as defined in the program MLPHARE [12CCP4The CCP4 (Collaborative Computational Project 4) suite programs for protein crystallography.Acta Crystallogr. D. 1994; 50: 760-763Crossref PubMed Scopus (17715) Google Scholar], i.e. lack-of-closure divided by the anomalous difference, was used as an indicator. The lowest RCullis was obtained at 2.52 Å resolution (RCullis = 50%), but even at 1.83 Å resolution it was still as low as 60%, indicating phases of relatively high quality. The correlation coefficients (calculated with the program OVERLAPMAP) of the corresponding electron density maps to the final refined maps turned out to be between 0.41 (data set A, ) and 0.51 (data set C at all resolutions). The phases from MLPHARE were directly passed on to a solvent flattening procedure carried out with the program DM [12CCP4The CCP4 (Collaborative Computational Project 4) suite programs for protein crystallography.Acta Crystallogr. D. 1994; 50: 760-763Crossref PubMed Scopus (17715) Google Scholar]. This procedure included histogram matching and the multiresolution application (keywords SOLV, HIST, MULT). The number of cycles was determined automatically based on the observed decrease of the real-space free residual. Thereafter, the correlation coefficients to the final refined maps (Figure 3a) were between 0.58 (data set A, dmin = 2.52 Å) and 0.80 (data set C, ). At this stage the benefit of the higher-resolution data became clearly evident. The phases returned by the solvent-flattening procedure were used for calculating a starting electron density map for automated model building and refinement by the wARP procedure [19Perrakis A. Morris R. Lamzin V.S. Automated protein model building combined with iterative structure refinement.Nature Struct. Biol. 1999; 6: 458-463Crossref PubMed Scopus (2543) Google Scholar]. They were first improved by the calculation of 15 cycles of free-atom density modification, then 120 cycles of the standard warpNtrace procedure with complete rebuilding of the model every eighth cycle were run. After 120 cycles, the amino acid sequence was matched to the resulting model. A structure was defined to be determined completely if more than 90% of the amino acids were built into the electron density with a confidence level of at least 80%. The five data sets were again analysed at four different resolutions (Figure 3). The numbers in bold in Figure 3a depict the successful automated structure determinations, whereas the others show the cases that did not yield a complete structure and were therefore not successful in this respect. All three data sets A–C resulted in a complete structure at dmin = 1.83 Å; data sets C and D yielded a partially complete model after 120 cycles at , which could be completed by employing 120 more warpNtrace cycles. It could also be shown that the combination of phases of limited resolution and of amplitudes extending to 1.83 Å resolution also leads to successful automated structure determination. For data sets A–C, the phases calculated to 2.52 Å, 2.20 Å, and 2.01 Å were combined with the respective 1.83 Å amplitudes, and for data sets D and E, the phases were combined with the 1.83 Å amplitudes of data set A. The high-resolution amplitudes were introduced for the solvent-flattening procedure, in which they were able to contribute significantly to the quality of the final phases. The combinations that led to a successful automated structure determination are identified as the bold numbers in Figure 3b. Almost all experiments were then successful, except for data sets A and E at a phase resolution of 2.52 Å and data set A with 2.20 Å phases. In the case of data set A, the anomalous signal was obviously too weak, and in the case of data set E the measured anomalous differences presumably suffered from overly large absorption errors for which the scaling stage could not fully compensate. However, these results demonstrate that it may be feasible to extend the described sulfur SAS approach to a dual-wavelength approach in which the long-wavelength data set provides the anomalous differences and the initial phasing information and the short-wavelength, high-resolution data set provides the amplitudes and the information for phase extension and automated model building and refinement. In terms of data quality and signal-to-noise ratio, data set C (λ = 1.9 Å) is the best data set of the ones tried. This is reflected in the highest success rate for SnB substructure determination and in the highest map correlation coefficients between the experimental electron density map and the one based on model phases. It may be the case that data collection at constitutes a good compromise between enhancement of the anomalous signal and the severity of the encountered absorption error. Beyond 1.9 Å the pseudo-absorption correction employed for scaling the integrated intensities may not be able to fully compensate for the increasing absorption effects despite the fact that the signal also increases. In all cases tried it was possible to solve the anomalous substructure and to calculate phases. With map correlation coefficients of up to 0.80 after solvent flattening, most of the phase sets were of sufficient quality to allow interpretation. In some cases, the combination of sufficiently enhanced anomalous signal and maximum nominal resolution of about 2.0 Å even provided the basis for a successful automated structure determination, and in almost all cases the automated structure determination protocol was successful when phases of limited resolution were combined with amplitudes of high resolution. As of May 29, 2001 the Protein Data Bank [20Berman H.M. Bourne P.E. et al.The Protein Data Bank.Nucl. Acids Res. 2000; 28: 235-242Crossref PubMed Scopus (24245) Google Scholar] contained 12,432 crystallographically determined structures. Of these, 5753 (46.3%) were determined at a resolution of 2.0 Å or better, and 3046 (24.5%) were determined at a resolution of 1.8 Å or better. In other words, one could in principle determine almost half of all structures in the PDB by our approach by just taking the resolution criterion into account. The results presented here demonstrate that it is possible to solve the structure of a 35 kDa protein based on just one diffraction data set by making use of the weak anomalous signal present. With 35 kDa this example is about 50% larger than Obelin, which previously constituted the largest protein for which a similar feat was accomplished [8Liu Z.-J. Vysotski E.S. Chen C.-J. Rose J.P. Lee J. Wang B.-C. Structure of the Ca2+-regulated photoprotein obelin at 1.7 Å resolution determined directly from its sulfur substructure.Protein Science. 2000; 9: 2085-2093Crossref PubMed Scopus (177) Google Scholar]. Further refinement and optimization of the method may then even make protein structures of molecular weight of up to 50 or 60 kDa amenable to this approach. The fact that Thermolysin contains metal ions such as zinc and calcium certainly facilitated the structure solution by this approach to some extent. It may be argued that one could also have solved the structure by using anomalous data collected at or near the absorption edge of zinc. This is indeed the case, but the phases obtained from anomalous differences collected at the wavelength are slightly inferior to the ones obtained from the anomalous differences of data set C (data not shown), which is in accord with the estimated anomalous diffraction ratios displayed in Figure 1. It should also be noted that Thermolysin contains an unusually small number of methionine and no cysteine residues. Given an average amino acid composition of a protein as can for instance be calculated from sequence data [21McCaldon P. Argos P. Oligopeptide biases in protein sequences and their use in predicting protein coding regions in nucleotide sequences.Proteins. 1988; 4: 99-122Crossref PubMed Scopus (229) Google Scholar], a protein contains 2.4 methionine and 1.7 cysteine residues per 100 amino acids on average. Based on such a composition, an anomalous diffraction ratio of 1.3% at a data collection wavelength of about 1.9 Å can be estimated with the equation presented above. Even though there is still a lot of room for improvements of the currently used methods and protocols at several stages, the results presented suggest how, using only currently available, noncommercial crystallographic computer programs, one can design a modular software architecture that, if implemented on synchrotron beamlines, could provide the basis for automated macromolecular structure determination in situ. It can be expected that the various structural-genomics projects undertaken worldwide will produce a large number of structure determinations in the next decade, of which a significant fraction will have to be carried out automatically. The outlined approach or an improved version of it may have the potential to become one of the standard approaches to accomplishing this feat. A method is described that in our opinion has the potential to become a valuable addition to the nowadays still small arsenal of procedures that can be employed to determine a novel intermediate-sized protein structure completely automatically. The three main features of the method are: (i) the use of soft X-rays in the wavelength range for diffraction data collection, (ii) the collection of highly redundant diffraction data, and (iii) a suitable scaling protocol for performing a pseudo-absorption correction. This approach was tested on hexagonal crystals of the Zn-metalloprotease Thermolysin. The complete anomalous substructure (one Zn2+, five Ca2+, three S) could be determined based on anomalous intensity differences as small as 1.3%. All combinations of data collection wavelength and maximum resolution tried were successful. Phase determination was then carried out with standard programs, and the model was built automatically. The only manual intervention necessary was the shuffling of the data between the different computer programs. No evaluation of the intermediate results was necessary at any stage. After the autobuilding stage, the model was nearly complete, and building of the remaining parts appeared to be straightforward. The time required for completing and checking the model on a graphics terminal was estimated to be less than an hour or two. The described approach requires only modestly high-resolution data; it should therefore be applicable to a wide range of protein crystals that contain up to 35 kDa (or maybe even more) in their asymmetric unit. We would like to thank Dr. Kristina Djinovic-Carugo and the X-ray diffraction staff of the ELETTRA synchrotron (Trieste, Italy) for their data collection facilities and their help during data collection. This work was in part supported by Bundesministerium für Bildung und Forschung (through project code DESY-HS), grant 05SH8BJA1 to R.H., who also thanks the Fonds der Chemischen Industrie for support.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call