HETSIM: Simulating Large-Scale Heterogeneous Systems using a Trace-driven, Synchronization and Dependency-Aware Framework

Subhankar Pal,Trevor Mudge,Bjorn Franke,Kuba Kaszyk,Ronald G Dreslinski,Siying Feng,Michael O'Boyle,Murray Cole

doi:10.1109/iiswc50251.2020.00011

Abstract

The rising complexity of large-scale heterogeneous architectures, such as those composed of off-the-shelf processors coupled with fixed-function logic, has imposed challenges for traditional simulation methodologies. While prior work has explored trace-based simulation techniques that offer good tradeoffs between simulation accuracy and speed, most such proposals are limited to simulating chip multiprocessors (CMPs) with up to hundreds of threads. There exists a gap for a framework that can flexibly and accurately model different heterogeneous systems, as well as scales to a larger number of cores. We implement a solution called HETSIM, a trace-driven, synchronization and dependency-aware framework for fast and accurate pre-silicon performance and power estimations for heterogeneous systems with up to thousands of cores. HETSIM operates in four stages: compilation, emulation, trace generation and trace replay. Given (i) a specification file, (ii) a multithreaded implementation of the target application, and (iii) an architectural and power model of the target hardware, HETSIM generates performance and power estimates with no further user intervention. HETSIM distinguishes itself from existing approaches through emulation of target hardware functionality as software primitives. HETSIM is packaged with primitives that are commonplace across many accelerator designs, and the framework can easily be extended to support custom primitives. We demonstrate the utility of HETSIM through design-space exploration on two recent target architectures: (i) a reconfigurable many-core accelerator, and (ii) a heterogeneous, domain-specific accelerator. Overall, HETSIM demonstrates simulation time speedups of 3.2×-10.4× (average 5.0×) over gem5 in syscall emulation mode, with average deviations in simulated time and power consumption of 15.1% and 10.9%, respectively. HETSIM is validated against silicon for the second target and estimates performance within a deviation of 25.5%, on average.

Highlights

In the last few decades, there has been a strong and consistent trend of adding more parallelism into new architectures and systems [10]
New architectures have been driven by a demand for accelerating increasingly parallel applications, and increasingly irregular ones that rely on memory-intensive algorithms such as sparse linear algebra operations [14]–[18]
Dependency tracking in HETSIM was beneficial to faithfully model the multiple outstanding requests supported by the processing elements (PEs), which is critical for the memory-bound sparse matrix-matrix multiplication (SpMM) workload

Summary

Introduction

In the last few decades, there has been a strong and consistent trend of adding more parallelism into new architectures and systems [10]. New architectures have been driven by a demand for accelerating increasingly parallel applications, and increasingly irregular ones that rely on memory-intensive algorithms such as sparse linear algebra operations [14]–[18]. The post-Dennard scaling era has experienced a similar trend, with heterogeneous systems that have multiple CPUs working in tandem with fixed-function accelerators and GPUs [19]. This includes the broad categories of looselycoupled accelerators, where the fixed-function logic has separate data/control path from the main pipeline This includes the broad categories of looselycoupled accelerators, where the fixed-function logic has separate data/control path from the main pipeline (e.g. [20]–[22]), as well as tightly-coupled accelerators, where the logic shares resources with the pipeline (e.g. [23]–[26])

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

HETSIM: Simulating Large-Scale Heterogeneous Systems using a Trace-driven, Synchronization and Dependency-Aware Framework

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Oct 1, 2020
Citations: 38	License type: other-oa

Similar Papers

Architectural Exploration of Large-Scale Hierarchical Chip Multiprocessors
Nikita Nikitin ... Javier De San Pedro
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | VOL. 32
Nikita Nikitin, et. al.Nikita Nikitin ... Javier De San Pedro
01 Oct 2013
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | VOL. 32

Power for Genetic Association Studies with Random Allele Frequencies and Genotype Distributions
Walter T Ambrosius ... Carl D Langefeld
The American Journal of Human Genetics | VOL. 74
Walter T Ambrosius, et. al.Walter T Ambrosius ... Carl D Langefeld
01 Apr 2004
The American Journal of Human Genetics | VOL. 74

Optimization of energy and power driven architectural exploration for multi-core and heterogeneous System on Chip

-

14 Mar 2019
14 Mar 2019

Network-on-Chip (NoC) Architectures for Exa-scale Chip-Multi-Processors (CMPs)
Ankit More
-
Ankit MoreAnkit More
16 Jul 2021
16 Jul 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HETSIM: Simulating Large-Scale Heterogeneous Systems using a Trace-driven, Synchronization and Dependency-Aware Framework

Abstract

Highlights

Summary

Talk to us

Similar Papers