Improvements in free R cross-validation are based on changed scaling procedures and the use, in map calculation, of estimates of the validation amplitudes which are independent of the actual observed values. The deleterious effects of the omitted test data are mitigated by reduction of the test-set size, which is made possible by constraining test and working sets to share the same scaling coefficients, thereby reducing the degrees of freedom and the dependence of free R on data selection. Further improvements come with use of a modified free R factor, R freeTA. Instead of omitting the validation reflections from map calculation, their amplitudes are replaced by the average of resolution peers that is (nearly) independent of the actual cross-validation amplitudes. The improvements are relevant to model building, phase refinement by density modification and especially to real-space refinement. Although for real data at about 3 A resolution, free R factors of about 0.25 are affected little, the precision of the structure is improved by about 0.1 A. Tests with simulated data show that with good agreement between observed and calculated amplitudes (as in very high resolution studies or simulated refinement tests), free R factors can be improved by factors greater than two.
Read full abstract