Cache Access Research Articles

Current trends point toward future many-core processors being implemented using the hardware-managed, implicitly addressed, coherent caches memory model. With this memory model, all on-chip storage is used for private and shared caches that are kept coherent by hardware. Communication between cores is performed by writing to and reading from shared memory, and a scalable point-to-point interconnection network is in charge of transmitting messages. Cache coherence in this context is guaranteed by means of a directory-based protocol. Unfortunately, it has been previously shown that the directory structure required to keep track of sharers can restrict the scalability of these designs due its excessive area or energy requirements, or for a compressed directory, the increased coherence traffic that in some cases it could cause. On the other hand, in many-core architectures, memory blocks are commonly assigned to the banks of a NUCA shared cache by following a physical mapping. This mapping assigns blocks to cache banks in a round-robin fashion, thus neglecting the distance between the cores that more frequently access every block and the corresponding NUCA bank for the block. This issue impacts both cache access latency and the amount of on-chip network traffic generated and causes that some area- and energy-efficient compressed directories significantly increase the number of messages per coherence event, which finally translates into degraded performance. In this work we propose an efficient and low-overhead coherence directory which is built around two main ingredients: the first is the use of the distance-aware round-robin mapping policy, an OS-managed policy which tries to map the pages accessed by a core to its closest (local) bank, at the same time it introduces an upper bound on the deviation of the distribution of memory pages among cache banks, which lessens the number of off-chip accesses. The second is the utilization of a very compressed directory structure which takes advantage of this mapping policy to represent sharers in a very compact way without increasing coherence network traffic. Simulation results for a 32-core architecture demonstrate that compared to a full-map directory using the typical round-robin physical mapping policy, our proposal drastically reduces the size of the directory structure (and thus, its area and energy requirements); at the same time, it does not increase coherence network traffic and 6 % average savings in execution time are achieved.

Cache Access Research Articles

Related Topics

Articles published on Cache Access

Cache based Side Channel Attack on AES in Cloud Computing Environment

Improving last level shared cache performance through mobile insertion policies (MIP)

Hybrid Shared-aware Cache Coherence Transition Strategy

A low-power-oriented cache design for multicore processors

Data Rate Estimation for Wireless Core-to-Cache Communication in Multicore CPUs

Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors

Software-Based Self-Test for Small Caches in Microprocessors

DASC-DIR: a low-overhead coherence directory for many-core processors

NUCA-L1

Branch Prediction-Directed Dynamic Instruction Cache Locking for Embedded Systems

FastTag: A Technique to Protect Cache Tags Against Soft Errors

Cache Hierarchy Optimization

Collective Communication Optimization for Solving Linear Algebraic Equations

ATLAS offline software performance monitoring and optimization

Revisiting LP-NUCA Energy Consumption

ASYNCHRONOUS INSTRUCTION CACHE MEMORY FOR AVERAGE-CASE PERFORMANCE

Column selection solutions for L 1 data caches implemented using eight‐transistor cells

Linked instruction caches for enhancing power efficiency of embedded systems

A cache-aware motion estimation organization for a hardware-based H.264 encoder

Exploiting Early Tag Access for Reducing L1 Data Cache Energy in Embedded Processors

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Cache Access Research Articles

Related Topics

Articles published on Cache Access

Cache based Side Channel Attack on AES in Cloud Computing Environment

Improving last level shared cache performance through mobile insertion policies (MIP)

Hybrid Shared-aware Cache Coherence Transition Strategy

A low-power-oriented cache design for multicore processors

Data Rate Estimation for Wireless Core-to-Cache Communication in Multicore CPUs

Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors

Software-Based Self-Test for Small Caches in Microprocessors

DASC-DIR: a low-overhead coherence directory for many-core processors

NUCA-L1

Branch Prediction-Directed Dynamic Instruction Cache Locking for Embedded Systems

FastTag: A Technique to Protect Cache Tags Against Soft Errors

Cache Hierarchy Optimization

Collective Communication Optimization for Solving Linear Algebraic Equations

ATLAS offline software performance monitoring and optimization

Revisiting LP-NUCA Energy Consumption

ASYNCHRONOUS INSTRUCTION CACHE MEMORY FOR AVERAGE-CASE PERFORMANCE

Column selection solutions for L 1 data caches implemented using eight‐transistor cells

Linked instruction caches for enhancing power efficiency of embedded systems

A cache-aware motion estimation organization for a hardware-based H.264 encoder

Exploiting Early Tag Access for Reducing L1 Data Cache Energy in Embedded Processors