Commercial Workloads Research Articles

This paper describes Embra, a simulator for the processors, caches, and memory systems of uniprocessors and cache-coherent multiprocessors. When running as part of the SimOS simulation environment, Embra models the processors of a MIPS R3000/R4000 machine faithfully enough to run a commercial operating system and arbitrary user applications. To achieve high simulation speed, Embra uses dynamic binary translation to generate code sequences which simulate the workload. It is the first machine simulator to use this technique. Embra can simulate real workloads such as multiprocess compiles and the SPEC92 benchmarks running on Silicon Graphic's IRIX 5.3 at speeds only 3 to 9 times slower than native execution of the workload, making Embra the fastest reported complete machine simulator. Dynamic binary translation also gives Embra the flexibility to dynamically control both the simulation statistics reported and the simulation model accuracy with low performance overheads. For example, Embra can customize its generated code to include a processor cache model which allows it to compute the cache misses and memory stall time of a workload. Customized code generation allows Embra to simulate a machine with caches at slowdowns of only a factor of 7 to 20. Most of the statistics generated at this speed match those produced by a slower reference simulator to within 1%. This paper describes the techniques used by Embra to achieve high performance, focusing on the requirements unique to machine simulation, including modeling the processor, memory management unit, and caches. In order to study Embra's memory system performance we use the SimOS simulation system to examine Embra itself. We present a detailed breakdown of Embra's memory system performance for two cache hierarchies to understand Embra's current performance and to show that Embra's implementation techniques benefit significantly from the larger cache hierarchies that are becoming available. Embra has been used for operating system development and testing as well as for studies of computer architecture. In this capacity it has simulated large, commercial workloads including IRIX running a relational database system and a CAD system for billions of simulated machine cycles.

Experience has shown that many widely used benchmarks are poor predictors of the performance of systems running commercial applications. Research into this anomaly has long been hampered by a lack of address traces from representative multi-user commercial workloads. This paper presents research, using traces of industry-standard commercial benchmarks, which examines the characteristic differences between technical and commercial workloads and illustrates how those differences affect cache performance. Commercial and technical environments differ in their respective branch behavior, operating system activity, I/O, and dispatching characteristics. A wide range of uniprocessor instruction and data cache geometries were studied. The instruction cache results for commercial workloads demonstrate that instruction cache performance can no longer be neglected because these workloads have much larger code working sets than technical applications. For database workloads, a breakdown of kernel and user behavior reveals that the application component can exhibit behavior similar to the operating system and therefore, can experience miss rates equally high. This paper also indicates that “dispatching” or process switching characteristics must be considered when designing level-two caches. The data presented shows that increasing the associativity of second-level caches can reduce miss rates significantly. Overall, the results of this research should help system designers choose a cache configuration that will perform well in commercial markets.

Commercial Workloads Research Articles

Articles published on Commercial Workloads

Characterization of workloads for distributed DB/DC-Processing

Embra

Modelling test data for performance evaluation of large parallel database machines

Contrasting characteristics and cache performance of technical and multi-user commercial workloads

Contrasting characteristics and cache performance of technical and multi-user commercial workloads

Commercial workload performance in the IBM POWER2 RISC System/6000 processor

Characterization of alpha AXP performance using TP and SPEC workloads

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Commercial Workloads Research Articles

Articles published on Commercial Workloads

Characterization of workloads for distributed DB/DC-Processing

Embra

Modelling test data for performance evaluation of large parallel database machines

Contrasting characteristics and cache performance of technical and multi-user commercial workloads

Contrasting characteristics and cache performance of technical and multi-user commercial workloads

Commercial workload performance in the IBM POWER2 RISC System/6000 processor

Characterization of alpha AXP performance using TP and SPEC workloads