Abstract

Performance evaluation is a key task in computing and communication systems. Benchmarking is one of the most common techniques for evaluation purposes, where the performance of a set of representative applications is used to infer system responsiveness in a general usage scenario. Unfortunately, most benchmarking suites are limited to a reduced number of applications, and in some cases, rigid execution configurations. This makes it hard to extrapolate performance metrics for a general-purpose architecture, supposed to have a multi-year lifecycle, running dissimilar applications concurrently. The main culprit of this situation is that current benchmark-derived metrics lack generality, statistical soundness and fail to represent general-purpose environments. Previous attempts to overcome these limitations through random app mixes significantly increase computational cost (workload population shoots up), making the evaluation process barely affordable. To circumvent this problem, in this article we present a more elaborate performance evaluation methodology named BenchCast. Our proposal provides more representative performance metrics, but with a drastic reduction of computational cost, limiting app execution to a small and representative fraction marked through code annotation. Thanks to this labeling and making use of synchronization techniques, we generate heterogeneous workloads where every app runs simultaneously inside its Region Of Interest, making a few execution seconds highly representative of full application execution.

Highlights

  • REACHING nowadays the 50th anniversary of the important drawbacks

  • In this work we presented a processor evaluation methodology suitable for both performance and microarchitectural analyses

  • Taking advantage of some basic execution features present in many applications, we identified, labeled and synchronized the execution of their Region of Interest (ROI)

Read more

Summary

INTRODUCTION

REACHING nowadays the 50th anniversary of the important drawbacks. First, the number of applications commercialization of the first CPU-on-a-chip [1], we under evaluation is usually limited to a few tens. Despite the recent number of values is usually below the recommended emergence of domain-specific processors [2] (led by GPU limit to reach a reasonable confidence margin in the evalcomputing for deep-learning applications), the general- uation process. Most of the CPU market share purpose computing model still constitutes a relevant corresponds to environments (desktop, cloud computing) fraction of the semiconductor market In this computing where there is limited control of the kind of applications model, the processor runs applications To the best of our knowledge, this technique is order to define a reduced set of applications that are suf- usually employed with a single benchmark suite, and ficiently representative of a much broader usage scenario, parallel execution relies merely on launching every applicorresponding to a specific target environment We extend processor evaluation of micro-architectural parametrization (SMT and hardware prefetching), to prove that the technique is suitable to enhance understanding of the effect of these techniques

MOTIVATION
Workload Generation and Execution
Methodology Validation
SYSTEM EVALUATION THROUGH BENCHCAST
System-wide Performance
Simultaneous Multithreading
Hardware Prefetching
RELATED WORK
Findings
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call