Abstract

Memory layout transformations via data reorganization are very common operations, which occur as a part of the computation or as a performance optimization in data-intensive applications. These operations require inefficient memory access patterns and roundtrip data movement through the memory hierarchy, failing to utilize the performance and energy-efficiency potentials of the memory subsystem. This paper proposes a high-bandwidth and energy-efficient hardware accelerated memory layout transform (HAMLeT) system integrated within a 3D-stacked DRAM. HAMLeT uses a low-overhead hardware that exploits the existing infrastructure in the logic layer of 3D-stacked DRAMs, and does not require any changes to the DRAM layers, yet it can fully exploit the locality and parallelism within the stack by implementing efficient layout transform algorithms. We analyze matrix layout transform operations (such as matrix transpose, matrix blocking and 3D matrix rotation) and demonstrate that HAMLeT can achieve close to peak system utilization, offering up to an order of magnitude performance improvement compared to the CPU and GPU memory subsystems which does not employ HAMLeT.

Highlights

  • Main memory has been a major bottleneck in achieving high performance and energy efficiency for various computing systems

  • In practice, the offered high performance and energy efficiency potentials is only achievable via the efficient use of the main memory

  • We present HAMLeT, a hardware accelerated memory layout transform framework that efficiently reorganizes the data within the memory by exploiting the 3Dstacked DRAM technology

Read more

Summary

INTRODUCTION

Main memory has been a major bottleneck in achieving high performance and energy efficiency for various computing systems. This leads to excessive DRAM row buffer misses and uneven distribution of the requests to the banks, ranks or layers which yield very low bandwidth utilization and incur significant energy overhead. HAMLeT uses high-bandwidth, low-latency and dense TSVs, and the customized logic layer underneath the DRAM to reorganize the data in the memory by avoiding the latency and the energy overhead of the roundtrip data movement through the memory hierarchy and the processor. To our knowledge, HAMLeT is the first work that proposes a high-performance and energy-efficient memory layout transformation accelerator integrated within a 3Dstacked DRAM.

DRAM Operation
HAMLET ARCHITECTURE
LAYOUT TRANSFORM OPERATIONS
Matrix Transpose
Matrix Blocking
Cube Rotation
ADDRESS REMAPPING
EVALUATION
Layout Transform Using HAMLeT
Comparison Against CPU and GPU
Hardware Area and Power Analysis
VIII. CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call