Abstract

The new emerging non-volatile memory technology of Spin Torque Transfer RAM (STT-RAM) has been proposed as a replacement for SRAM based cache. Recently its commercial step has been greatly boosted by big companies such as Samsung. Although STT-RAM has quite a few advantages such as nonvolatility, high density and extremely low leakage power consumption, it suffers high dynamic energy and long latency on write operations. Addressing this problem, researchers proposed a STT-RAM/SRAM hybrid structure to alleviate the side effect of write operations. In hybrid caches, a migration based technique is often adopted to explore the advantages of both parts of a hybrid cache by dynamically moving write-intensive and read-intensive data between STT-RAM and SRAM.Meanwhile, migrations also introduce extra reads and writes during data movements. For stencil loops with read and write data dependencies, it is observed that migration overhead is significant and migrations closely correlate to the interleaved read and write memory access pattern in a memory block. Loop retiming technique has proposed to reduce the migration overhead by changing the interleaved memory access pattern. It is known that loop retiming has been extensively studied to maximize instruction-level parallelism (ILP) of multiple function units by rearranging the dependence delays in a uniform loop. Both retiming techniques are conducted by changing the instruction dependence delays in a loop. However, this previous ILP-aware loop retiming is unaware of its impact on the hybrid cache's migration while the recent migration-aware loop retiming has not fully considered the parallelism of arithmetic and logical units (ALUs) in VLIW processors.It is sure that the impacts of retiming on both the migration overhead of hybrid cache and ILP of VLIW should be considered when architecting STT-RAM-based hybrid cache for VLIW processors. Addressing this issue, this paper models the impacts of loop retiming on both ILP of ALUs and migration overhead in STT-RAM/SRAM hybrid cache. An overall balanced loop retiming solution, considering both of the ALU part and the memory part, is devised to achieve high performance for VLIW processors. The experimental results across a set of benchmarks show that the proposed optimal and heuristic balanced retiming approaches can effectively improve the overall system performance over the cases with no retiming, pure migration-aware retiming and pure ILP-aware retiming, respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.