Comparative architectural characterization of SPEC CPU2000 and CPU2006 benchmarks on the intel&amp;#x00AE; Core&amp;#x2122; 2 Duo processor

Arun Kejariwal,Hideki Saito,Alexander V Veidenbaum,Alexandru Nicolau,Utpal Banerjee,Xinmin Tian,Milind Girkar

doi:10.1109/icsamos.2008.4664856

Abstract

SPEC CPU benchmarks are commonly used by compiler writers and architects of general purpose processors for performance evaluation. Since the release of the CPU89 suite, the SPEC CPU benchmark suites have evolved, with applications either removed or added or upgraded. This influences the design decisions for the next generation compilers and microarchitectures. In view of the above, it is critical to characterize the applications in the new suite - SPEC CPU2006 - to guide the decision making process. Although similar studies using the retired SPEC CPU benchmark suites have been done in the past, to the best of our knowledge, a thorough performance characterization of CPU2006 and its comparison with CPU2000 has not been done so far. In this paper, we present the above. For this, we compiled the applications in CPU2000 and CPU2006 using the Intel <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">reg2</sup> Fortran/C++ optimizing compiler and executed them, using the reference data sets, on the state-of-the-art Intel Coretrade2 Duo processor. The performance information was collected by using the Intel VTunetrade performance analyzer that takes advantage of the built-in hardware performance counters to obtain accurate information on program behavior and its use of processor resources. The focus of this paper is on branch and memory access behavior, the well-known reasons for program performance problems. By analyzing and comparing the L1 data and L2 cache miss rates, branch prediction accuracy, and resource stalls the performance impact in each suite is indirectly determined and described. Not surprisingly, the CPU2006 codes are larger, more complex, and have larger data sets. This leads to higher average L2 cache miss rates and a slight reduction in average IPC compared to the CPU2000 suite. Similarly, the average branch behavior is slightly worse in CPU2006 suite. However, based on processor stall counts branches are much less of a problem. The results presented here are a step towards understanding the SPEC CPU2006 benchmarks and will aid compiler writers in understanding the impact of currently implemented optimizations and in the design of new ones to address the new challenges presented by SPEC CPU2006. Similar opportunities exist for architecture optimization.

Full Text