Dynamic Memory Disambiguation Research Articles

Superscalar out-of-order cores deliver high performance at the cost of increased complexity and power budget. In-order cores, in contrast, are less complex and have a smaller power budget, but offer low performance. A processor architecture should ideally provide high performance in a power- and cost-efficient manner. Recently proposed slice-out-of-order (sOoO) cores identify backward slices of memory operations which they execute out-of-order with respect to the rest of the dynamic instruction stream for increased instruction-level and memory-hierarchy parallelism. Unfortunately, constructing backward slices is imprecise and hardware-inefficient, leaving performance on the table. In this article, we propose Forward Slice Core (FSC ), a novel core microarchitecture that builds on a stall-on-use in-order core and extracts more instruction-level and memory-hierarchy parallelism than slice-out-of-order cores. FSC does so by identifying and steering forward slices (rather than backward slices) to dedicated in-order FIFO queues. Moreover, FSC puts load-consumers that depend on L1 D-cache misses on the side to enable younger independent load-consumers to execute faster. Finally, FSC eliminates the need for dynamic memory disambiguation by replicating store-address instructions across queues. Considering 3-wide pipeline configurations, we find that FSC improves performance by 27.1%, 21.1%, and 14.6% on average compared to Freeway, the state-of-the-art sOoO core, across SPEC CPU2017, GAP, and DaCapo, respectively, while at the same time incurring reduced hardware complexity. Compared to an OoO core, FSC reduces power consumption by 61.3% and chip area by 47%, providing a microarchitecture with high performance at low complexity.

Read full abstract

An efficient mechanism to track and enforce memory dependences is crucial to an out-of-order microprocessor. The conventional approach of using cross-checked load queue and store queue, while very effective in earlier processor incarnations, suffers from scalability problems in modern high-frequency designs that rely on buffering many in-flight instructions to exploit instruction-level parallelism. In this paper, we make a case for a very different approach to dynamic memory disambiguation. We move away from the conventional exact disambiguation strategy and adopt an opportunistic method: we allow loads and stores to access an L0 cache as they are issued out of program order, hoping that with such a laissez-faire approach, most loads actually obtain the right value. To guarantee correctness, they execute a second time in program order to access the nonspeculative L1 cache. A discrepancy between the two executions triggers a replay. Such a design completely eliminates the necessity of real-time violation detection and thus avoids the conventional approach's complexity and the associated scalability issue. We show that even a simplistic design can provide similar performance level achieved with a conventional queue-based approach with optimisticallysized queues. When simple, optional optimizations are applied, the performance level is close to that achieved with ideally-sized queues.

Read full abstract

Dynamic Memory Disambiguation Research Articles

Related Topics

Articles published on Dynamic Memory Disambiguation

The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture

Slackened Memory Dependence Enforcement

ARB: a hardware mechanism for dynamic reordering of memory references

Dynamic memory disambiguation using the memory conflict buffer

Dynamic memory disambiguation using the memory conflict buffer

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Dynamic Memory Disambiguation Research Articles

Related Topics

Articles published on Dynamic Memory Disambiguation

The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture

Slackened Memory Dependence Enforcement

ARB: a hardware mechanism for dynamic reordering of memory references

Dynamic memory disambiguation using the memory conflict buffer

Dynamic memory disambiguation using the memory conflict buffer