Abstract
Performance evaluation is a key task in computing and communication systems. Benchmarking is one of the most common techniques for evaluation purposes, where the performance of a set of representative applications is used to infer system responsiveness in a general usage scenario. Unfortunately, most benchmarking suites are limited to a reduced number of applications, and in some cases, rigid execution configurations. This makes it hard to extrapolate performance metrics for a general-purpose architecture, supposed to have a multi-year lifecycle, running dissimilar applications concurrently. The main culprit of this situation is that current benchmark-derived metrics lack generality, statistical soundness and fail to represent general-purpose environments. Previous attempts to overcome these limitations through random app mixes significantly increase computational cost (workload population shoots up), making the evaluation process barely affordable. To circumvent this problem, in this article we present a more elaborate performance evaluation methodology named BenchCast. Our proposal provides more representative performance metrics, but with a drastic reduction of computational cost, limiting app execution to a small and representative fraction marked through code annotation. Thanks to this labeling and making use of synchronization techniques, we generate heterogeneous workloads where every app runs simultaneously inside its Region Of Interest, making a few execution seconds highly representative of full application execution.
Highlights
REACHING nowadays the 50th anniversary of the important drawbacks
In this work we presented a processor evaluation methodology suitable for both performance and microarchitectural analyses
Taking advantage of some basic execution features present in many applications, we identified, labeled and synchronized the execution of their Region of Interest (ROI)
Summary
REACHING nowadays the 50th anniversary of the important drawbacks. First, the number of applications commercialization of the first CPU-on-a-chip [1], we under evaluation is usually limited to a few tens. Despite the recent number of values is usually below the recommended emergence of domain-specific processors [2] (led by GPU limit to reach a reasonable confidence margin in the evalcomputing for deep-learning applications), the general- uation process. Most of the CPU market share purpose computing model still constitutes a relevant corresponds to environments (desktop, cloud computing) fraction of the semiconductor market In this computing where there is limited control of the kind of applications model, the processor runs applications To the best of our knowledge, this technique is order to define a reduced set of applications that are suf- usually employed with a single benchmark suite, and ficiently representative of a much broader usage scenario, parallel execution relies merely on launching every applicorresponding to a specific target environment We extend processor evaluation of micro-architectural parametrization (SMT and hardware prefetching), to prove that the technique is suitable to enhance understanding of the effect of these techniques
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Parallel and Distributed Systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.