Beryllium is an important nuclear material, and the reliability of the data for neutron-induced nuclear reactions of beryllium is of significant importance for nuclear engineering. The evaluated nuclear data for beryllium have been improving from ENDF/B-VI to ENDF/B-VⅡ.0 and then to ENDF/B-VⅡ.1. The comparisons between the calculated and experimental results of the criticality benchmark experiments are the essential means to test the reliability of nuclear data and indicate the direction of the improvement of nuclear data. There are several series of criticality benchmark experiments with beryllium reflector available for testing beryllium nuclear data. However, the calculated results are not consistent across these benchmarks. Two series of these benchmarks that are similar to each other, namely HMF058 and HMF066, are selected for discussion. HMF058 and HMF066 are both highly enriched metal fast benchmarks, with five cases of experiments in HMF058 benchmark and nine in HMF066. With ENDF/B-VⅡ.1 cross sections, a clearly increasing C/E keff bias is observed with increasing beryllium reflector thickness for the five cases in HMF058 benchmark, while using ENDF/B-VⅡ.0 cross sections, all the C/E values for keff remain within the experimental uncertainty. However, HMF066 are calculated very well with ENDF/B-VⅡ.1 cross sections, but a bias of about 500 pcm is observed with ENDF/B-VⅡ.0 data. These results are particularly puzzling since there is little difference between the configurations of HMF058 and HMF066, so the quality of beryllium nuclear data cannot be evaluated and the direction for improvement cannot be figured out either. The similarity method, based on the use of sensitivity coefficients calculated by sensitivity and uncertainty code SURE, is used to analyze the similarity between two series of benchmark experiments. First, the neutronics similarity index between each pair of the total of fourteen cases of experiments from the two benchmarks is calculated. Then, the most similar experiments from HMF066 to each case of the five experiments from HMF058 are selected by similarity index, and the experiments are grouped into five similarity suites, each with one from HMF058 and the others from HMF066. The experiments in the same similarity suite are highly similar to each other on neutronics. In a similarity suite, the deviations of calculated results and experimental values are disagreed for experiments from different series, but the deviations agree with each other for experiments from the same series. This shows that the agreement between the calculated results and experimental values cannot be improved by revising the nuclear data. It is necessary to carry out the detailed reevaluation of the benchmark experiments, or to develop reliable new integral experiments to exclude unreliable experiments, in order to avoid the misleading of the nuclear data testing.