Many-core Embedded System Research Articles

High-end embedded systems, like their general-purpose counterparts, are turning to many-core cluster-based shared-memory architectures that provide a shared memory abstraction subject to non-uniform memory access costs. In order to keep the cores and memory hierarchy simple, many-core embedded systems tend to employ simple, scratchpad-like memories, rather than hardware managed caches that require some form of cache coherence management. These “coherence-free” systems still require some means to synchronize memory accesses and guarantee memory consistency. Conventional lock-based approaches may be employed to accomplish the synchronization, but may lead to both usability and performance issues. Instead, speculative synchronization, such as hardware transactional memory, may be a more attractive approach. However, hardware speculative techniques traditionally rely on the underlying cache-coherence protocol to synchronize memory accesses among the cores. The lack of a cache-coherence protocol adds new challenges in the design of hardware speculative support. In this article, we present a new scheme for hardware transactional memory (HTM) support within a cluster-based, many-core embedded system that lacks an underlying cache-coherence protocol. We propose two alternative data versioning implementations for the HTM support, Full-Mirroring and Distributed Logging and we conduct a performance comparison between them. To the best of our knowledge, these are the first designs for speculative synchronization for this type of architecture. Through a set of benchmark experiments using our simulation platform, we show that our designs can achieve significant performance improvements over traditional lock-based schemes.

Read full abstract

Abstract It is a fundamental challenge to manage on-chip resources for future embedded applications executing concurrently on a NoC (network on chip) based many-core embedded system (MES). Embedded application allocation is required under constraints in the form of computing resources or communication resources. However, most existing techniques only focus on the optimization of communications between application threads and ignore a balanced utilization of on-chip resources, which is critical for embedded systems. In this paper, we propose a dynamic resource balance (DRB) algorithm to achieve a higher system performance by balancing the utilization of on-chip computing resources and communication resources. The DRB algorithm first constructs a mapping scheme using a dynamic communication optimization (DCO) algorithm and then chooses a corresponding number of resource regions for the constructed mapping scheme to allocate the application using a multi-rectangle selection (MRS) algorithm. We evaluate DRB algorithm in a popular simulator Graphite whose results reveal that DRB algorithm improves system throughput by at most up to 31.6%, 25.2%, and 9.4% compared with FF (First Free) algorithm, NN (Nearest Neighbor) algorithm, and CoNA-SHiC (Contiguous Neighbor Allocation and Smart Hill Climbing) algorithm, respectively.

Read full abstract

Many-core Embedded System Research Articles

Articles published on Many-core Embedded System

Hardware Transactional Memory Exploration in Coherence-Free Many-Core Architectures

Dynamic application allocation with resource balancing on NoC based many-core embedded systems

Improving Dynamic Memory Allocation on Many-Core Embedded Systems With Distributed Shared Memory

Static Mapping of Multiple Parallel Applications on Non-Hierarchical Manycore Embedded Systems

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Many-core Embedded System Research Articles

Articles published on Many-core Embedded System

Hardware Transactional Memory Exploration in Coherence-Free Many-Core Architectures

Dynamic application allocation with resource balancing on NoC based many-core embedded systems

Improving Dynamic Memory Allocation on Many-Core Embedded Systems With Distributed Shared Memory

Static Mapping of Multiple Parallel Applications on Non-Hierarchical Manycore Embedded Systems