Hybrid Memory Cube Research Articles

Emerging three-dimensional (3D) memory technologies, such as the Hybrid Memory Cube (HMC) and High Bandwidth Memory (HBM), provide high-bandwidth and massive memory-level parallelism. With the growing heterogeneity and complexity of computer systems (CPU cores and accelerators, etc.), efficiently integrating emerging memories into existing systems poses new challenges and requires detailed evaluation in a realistic computing environment. In this article, we propose MEG, an open source, configurable, cycle-exact, and RISC-V-based full-system emulation infrastructure using FPGA and HBM. MEG provides a highly modular hardware design and includes a bootable Linux image for a realistic software flow, so that users can perform cross-layer software-hardware co-optimization in a full-system environment. To improve the observability and debuggability of the system, MEG also provides a flexible performance monitoring scheme to guide the performance optimization. The proposed MEG infrastructure can potentially benefit broad communities across computer architecture, system software, and application software. Leveraging MEG, we present two cross-layer system optimizations as illustrative cases to demonstrate the usability of MEG. In the first case study, we present a reconfigurable memory controller to improve the address mapping of standard memory controller. This reconfigurable memory controller along with its OS support allows us to optimize the address mapping scheme to fully exploit the massive parallelism provided by the emerging three-dimensional (3D) memories. In the second case study, we present a lightweight IOMMU design to tackle the unique challenges brought by 3D memory in providing virtual memory support for near-memory accelerators. We provide a prototype implementation of MEG on a Xilinx VU37P FPGA and demonstrate its capability, fidelity, and flexibility on real-world benchmark applications. We hope MEG fills a gap in the space of publicly available FPGA-based full-system emulation infrastructures, specifically targeting memory systems, and inspires further collaborative software/hardware innovations.

Driven by performance, 3D packaging with through silicon vias (TSVs) has become a reality for memory and advanced logic. New versions are emerging. High Bandwidth Memory (HBM) is one of the most important 3D IC developments in the last 10 years. In the last decade, stacked DRAM with through silicon vias has transitioned from a handful of research programs to rapidly increasing volumes. Tezzaron has been providing small quantities of 3D ICs for high-speed memory applications since 2005. Micron, Samsung, and SK Hynix began producing DRAM stacks with TSVs in late 2014 and early 2015. Micron began shipments of its Hybrid Memory Cube (HMC) in 2015. DRAMs and the logic controller were interconnected with TSVs. HMC was packaged in a BGA and tested before assembly on the board. The HMC has been used in Intel’s Knights Landing. The silicon-on-insulator (SOI) logic layer was fabricated by GLOBALFOUNDRIES (which purchased IBM’s fab) and the memory was fabricated by Micron. Micron used a thermo-compression bond (TCB) process with a non-conductive film (NCF) underfill for its die stacking in the HMC. Thermal issues were a challenge. Today Samsung and SKHynix are providing HBM stacks to the industry and Micron is expected to ship products next year. In February 2018, Samsung announced that its first “three-layer” stacked CMOS image sensors (CIS) were in mass production and found in the Galaxy S9/S9+. This follows Sony’s adoption of three-layer CIS technology one year earlier in February 2017. A third silicon layer in the form of DRAM is added to the pixel array and signal processing layers to significantly improve data readout speeds. These devices are manufactured with 3D IC technologies using TSVs, but Samsung and Sony use different design approaches. Technically speaking, Samsung’s solution is a two-layer image sensor with a stacked DRAM. Sony uses a true three-layer bonded approach. In its three-layer CIS technology, Sony places the DRAM between the pixel array die and the image processor, with all three layers interconnected using TSVs. The pixel array, DRAM, and logic wafers are manufactured with 90nm, 30nm, and 40nm processes, respectively. While logic and memory stacks have remained elusive, Intel and TSMC have introduced new packaging technology with active interposers that are considered 3D. Intel has introduced its Foveros 3D integration technology as a form of heterogeneous system integration. The technology uses a 3D face-to-face stacking process. In the process, logic die are bumped and mounted on an active interposer next to memory or die with other functions. The active interposer can contain active parts of the system, such as the platform controller hub (PCH) that manages I/O for the system. The active interposer is mounted on a package substrate with solder bumps. TSMC has introduced its SoIC with face-to-face stacking and wafer-on-wafer (WoW). These technologies are front-end packaging and use direct or fusion bonding. The SoIC is a wafer with logic, with our without TSVs.

Hybrid Memory Cube Research Articles

Related Topics

Articles published on Hybrid Memory Cube

ALAMNI: Adaptive LookAside Memory based Near-Memory Inference Engine for Eliminating Multiplications in Real-Time

High-throughput Near-Memory Processing on CNNs with 3D HBM-like Memory

CLU

HAM: Hotspot-Aware Manager for Improving Communications With 3D-Stacked Memory

Enabling fast and energy-efficient FM-index exact matching using processing-near-memory

Task Parallelism-Aware Deep Neural Network Scheduling on Multiple Hybrid Memory Cube-Based Processing-in-Memory

Functionality-Based Processing-in-Memory Accelerator for Deep Convolutional Neural Networks

MEG

NZESPA: A Near-3D-Memory Zero Skipping Parallel Accelerator for CNNs

Power-Time Exploration Tools for NMP-Enabled Systems

(Invited) Trends in 3D Markets and Technology

Thermal-aware processing-in-memory instruction offloading

A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets

GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing

Subsystem under 3D-Storage Class Memory on a chip

An Accelerator for Resolution Proof Checking based on FPGA and Hybrid Memory Cube Technology

PIM-WEAVER: A High Energy-efficient, General-purpose Acceleration Architecture for String Operations in Big Data Processing

Ultra-Short-Reach Interconnects for Die-to-Die Links: Global Bandwidth Demands in Microcosm

CGAcc: A Compressed Sparse Row Representation-Based BFS Graph Traversal Accelerator on Hybrid Memory Cube

Autonomous high‐speed serial link power management depending on required link performance for HMC

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Hybrid Memory Cube Research Articles

Related Topics

Articles published on Hybrid Memory Cube

ALAMNI: Adaptive LookAside Memory based Near-Memory Inference Engine for Eliminating Multiplications in Real-Time

High-throughput Near-Memory Processing on CNNs with 3D HBM-like Memory

CLU

HAM: Hotspot-Aware Manager for Improving Communications With 3D-Stacked Memory

Enabling fast and energy-efficient FM-index exact matching using processing-near-memory

Task Parallelism-Aware Deep Neural Network Scheduling on Multiple Hybrid Memory Cube-Based Processing-in-Memory

Functionality-Based Processing-in-Memory Accelerator for Deep Convolutional Neural Networks

MEG

NZESPA: A Near-3D-Memory Zero Skipping Parallel Accelerator for CNNs

Power-Time Exploration Tools for NMP-Enabled Systems

(Invited) Trends in 3D Markets and Technology

Thermal-aware processing-in-memory instruction offloading

A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets

GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing

Subsystem under 3D-Storage Class Memory on a chip

An Accelerator for Resolution Proof Checking based on FPGA and Hybrid Memory Cube Technology

PIM-WEAVER: A High Energy-efficient, General-purpose Acceleration Architecture for String Operations in Big Data Processing

Ultra-Short-Reach Interconnects for Die-to-Die Links: Global Bandwidth Demands in Microcosm

CGAcc: A Compressed Sparse Row Representation-Based BFS Graph Traversal Accelerator on Hybrid Memory Cube

Autonomous high‐speed serial link power management depending on required link performance for HMC