Abstract

Due to the amount of data involved in emerging deep learning and big data applications, operations related to data movement have quickly become a bottleneck. Data-centric computing (DCC), as enabled by processing-in-memory (PIM) and near-memory processing (NMP) paradigms, aims to accelerate these types of applications by moving the computation closer to the data. Over the past few years, researchers have proposed various memory architectures that enable DCC systems, such as logic layers in 3D-stacked memories or charge-sharing-based bitwise operations in dynamic random-access memory (DRAM). However, application-specific memory access patterns, power and thermal concerns, memory technology limitations, and inconsistent performance gains complicate the offloading of computation in DCC systems. Therefore, designing intelligent resource management techniques for computation offloading is vital for leveraging the potential offered by this new paradigm. In this article, we survey the major trends in managing PIM and NMP-based DCC systems and provide a review of the landscape of resource management techniques employed by system designers for such systems. Additionally, we discuss the future challenges and opportunities in DCC management.

Highlights

  • For the past few decades, memory performance improvements have lagged behind compute performance improvements, creating an increasing mismatch between the time to transfer data and the time to perform computations on these data

  • In order to properly manage what is offloaded onto PIM or near-memory processing (NMP) systems, where it is offloaded, and when it is offloaded, prior work has utilized one of three different strategies: (1) code annotation: techniques that rely on the programmers to select and determine the appropriate sections of code to offload; (2) compiler optimization: techniques that attempt to automatically identify what to offload during compile-time; (3) online heuristics: techniques that use a set of rules to determine what to offload during run-time

  • There are several challenges and opportunities for resource management of PIM/NMP substrates related to generalizability, multi-objective considerations, reliability, and the application of more intelligent techniques, e.g., machine learning (ML), as discussed below:

Read more

Summary

Introduction

For the past few decades, memory performance improvements have lagged behind compute performance improvements, creating an increasing mismatch between the time to transfer data and the time to perform computations on these data (the “memory wall”). It is evident that the large latencies and energies involved with moving data to the processor will present an overwhelming bottleneck in future systems To address this issue, researchers have proposed to reduce these costly data movements by introducing data-centric computing (DCC), where some of the computations are moved in proximity to the memory architecture. Better performance for genomic applications and 10× better energy consumption in an Intel x86 server compared to an Intel x86 server without UPMEM [4] Both PIM and NMP systems have the potential to speed up application execution by reducing data movements. We survey the landscape of different resource management techniques that decide which computations are offloaded onto the PIM/NMP systems.

Prior Surveys and Scope
Data-Centric Computing Architectures
PIM Using DRAM
PIM Using NVM
Near‐Memory
PE Types
Memory Types
Resource
Optimization Objectives
Performance
Energy Efficiency
Power and Thermal Efficiency
Optimization Knobs
Identification of Offloading Workloads
Selection of Memory PE
Timing of Offloads
Management Techniques
Code Annotation Approaches
Compiler-Based Approaches
10. HMC atomic atomic instructions instructions in in HMC
11. CAIRO’s
Online Heuristic
Findings
Conclusions and Future
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call