A Two-Level Load/Store Queue Based on Execution Locality

Miquel Pericàs,Ruben González,Daniel A Jiménez,Alex Veidenbaum,Francisco J Cazorla,Adrian Cristal,Mateo Valero

doi:10.1145/1394608.1382171

Abstract

Multicore processors have emerged as a powerful platform on which to efficiently exploit thread-level parallelism (TLP). However, due to Amdahl’s Law, such designs will be increasingly limited by the remaining sequential components of applications. To overcome this limitation it is necessary to design processors with many lower–performance cores for TLP and some high-performance cores designed to execute sequential algorithms. Such cores will need to address the memory-wall by implementing kilo-instruction windows. Large window processors require large Load/Store Queues that would be too slow if implemented using current CAMbased designs. This paper proposes an Epoch-based Load Store Queue (ELSQ), a new design based on Execution Locality. It is integrated into a large-window processor that has a fast, out-of-order core operating only on L1/L2 cache hits and N slower cores that process L2 misses and their dependent instructions. The large LSQ is coupled with the slow cores and is partitioned into N small and local LSQs, one per core. We evaluate ELSQ in a large-window environment, finding that it enables high performance at low power. By exploiting locality among loads and stores, ELSQ outperforms even an idealized central LSQ when implemented on top of a decoupled processor design.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: ACM SIGARCH Computer Architecture News	Publication Date: Jun 1, 2008
Citations: 24	License type: other-oa

R Discovery Prime

R Discovery Prime

A Two-Level Load/Store Queue Based on Execution Locality

Abstract

Talk to us

Similar Papers

More From: ACM SIGARCH Computer Architecture News

Lead the way for us

Similar Papers

A comprehensive scheduler for asymmetric multicore systems
Juan Carlos Saez ... Manuel Prieto
-
Juan Carlos Saez, et. al.Juan Carlos Saez ... Manuel Prieto
13 Apr 2010
13 Apr 2010

A high-performance sorting algorithm for multicore single-instruction multiple-data processors
Hiroshi Inoue ... Toshio Nakatani
Software: Practice and Experience | VOL. 42
Hiroshi Inoue, et. al.Hiroshi Inoue ... Toshio Nakatani
19 Jul 2011
Software: Practice and Experience | VOL. 42

Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
Jack L Lo ... Joel S Emer
ACM Transactions on Computer Systems | VOL. 15
Jack L Lo, et. al.Jack L Lo ... Joel S Emer
01 Aug 1997
ACM Transactions on Computer Systems | VOL. 15

Innovations in Multicore Network Processor Design for Enhanced Performance
Aravindsundeep Musunuri ... A Renuka
Innovative Research Thoughts | VOL. 9
Aravindsundeep Musunuri, et. al. Aravindsundeep Musunuri ... A Renuka
30 Jun 2023
Innovative Research Thoughts | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Two-Level Load/Store Queue Based on Execution Locality

Abstract

Talk to us

Similar Papers

More From: ACM SIGARCH Computer Architecture News