Full System Simulation Environment Research Articles

A traditional fixed-function graphics accelerator has evolved into a programmable general-purpose graphics processing unit over the past few years, the general-purpose computing on GPU (GPGPU). Recently, revolutionary measures have been taken along this direction: an integrated GPU, i.e., CPUs and GPUs are integrated into the same package or even into the same die. However, considering a system-on-chip, the GPU takes up considerable silicon resources, but when running non-graphical workloads or non-GPGPU applications it is likely that overall system performance will not be affected. This paper presents a novel approach to accelerate conventional operations that are normally performed on CPUs, which are bulk memory operations such as memcpy or memcmp, using an integrated GPU. Offloading bulk memory operations to the GPU has many benefits: (i) The throughput GPU outperforms the CPU in bulk memory operations; (ii) for on-die GPUs with unified cache between the GPU and the CPU, the CPU can utilize the GPU private cache to store the moved data and reduce the CPU cache bottleneck; (iii) additional lightweight hardware can also support asynchronous offloads; and (iv) unlike the prior art using a dedicated hardware copy engine (e.g., DMA), our approach utilizes as much GPU hardware resources as possible. The performance results based on our solution showed that offloaded bulk memory operations outperform CPU up to 4.3 times faster on micro-benchmarks while using fewer resources. Using eight real-world applications and a cycle-based full-system simulation environment, five of eight applications showed about 30% speedup and two applications showed about 20% speedup.

In the next decade, a growing number of scientific and industrial applications will require power-efficient systems providing unprecedented computation, memory, and communication resources. A promising paradigm foresees the use of heterogeneous many-tile architectures. The resulting computing systems are complex: they must be protected against several sources of faults and critical events, and application programmers must be provided with programming paradigms, software environments and debugging tools adequate to manage such complexity. The EURETILE (European Reference Tiled Architecture Experiment) consortium conceived, designed, and implemented: 1- an innovative many-tile, many-process dynamic fault-tolerant programming paradigm and software environment, grounded onto a lightweight operating system generated by an automated software synthesis mechanism that takes into account the architecture and application specificities; 2- a many-tile heterogeneous hardware system, equipped with a high-bandwidth, low-latency, point-to-point 3D-toroidal interconnect. The inter-tile interconnect processor is equipped with an experimental mechanism for systemic fault-awareness; 3- a full-system simulation environment, supported by innovative parallel technologies and equipped with debugging facilities. We also designed and coded a set of application benchmarks representative of requirements of future HPC and Embedded Systems, including: 4- a set of dynamic multimedia applications and 5- a large scale simulator of neural activity and synaptic plasticity. The application benchmarks, compiled through the EURETILE software tool-chain, have been efficiently executed on both the many-tile hardware platform and on the software simulator, up to a complexity of a few hundreds of software processes and hardware cores.

Full System Simulation Environment Research Articles

Related Topics

Articles published on Full System Simulation Environment

Accelerated bulk memory operations on heterogeneous multi-core systems

Thread Voting DVFS for Manycore NoCs

MTraceCheck

Dynamic many-process applications on many-tile embedded systems and HPC clusters: The EURETILE programming environment and execution platforms

PDNOC: Partially diagonal network‐on‐chip for high efficiency multicore systems

Dynamic fault‐tolerant routing algorithm for networks‐on‐chip based on localised detouring paths

Multilevel Cache Modeling for Chip-Multiprocessor Systems

Three-Dimensional Chip-Multiprocessor Run-Time Thermal Management

Application of full-system simulation in exploratory system design and development

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Full System Simulation Environment Research Articles

Related Topics

Articles published on Full System Simulation Environment

Accelerated bulk memory operations on heterogeneous multi-core systems

Thread Voting DVFS for Manycore NoCs

MTraceCheck

Dynamic many-process applications on many-tile embedded systems and HPC clusters: The EURETILE programming environment and execution platforms

PDNOC: Partially diagonal network‐on‐chip for high efficiency multicore systems

Dynamic fault‐tolerant routing algorithm for networks‐on‐chip based on localised detouring paths

Multilevel Cache Modeling for Chip-Multiprocessor Systems

Three-Dimensional Chip-Multiprocessor Run-Time Thermal Management

Application of full-system simulation in exploratory system design and development