Dynamic Memory Management Research Articles

To accommodate the increasingly large-scale models within limited-capacity GPU memory, various coarse-grained techniques, such as recomputation and swapping, have been proposed to optimize memory usage. However, these methods have encountered limitations, either in terms of inefficient memory reduction or diminished training performance. In response to this, our paper introduces DELTA, an innovative approach for memory-efficient large-scale model training that combines fine-grained memory optimization and prefetching technology to reduce memory usage while maintaining high training throughput concurrently. Initially, we formulate the problem of memory-throughput joint optimization as an easy-solving 0/1 Knapsack problem. Leveraging this formalization, we use an improving polynomial complexity heuristic algorithm to address the problem effectively. Furthermore, we introduce a novel bidirectional prefetching technology into dynamic memory management, which significantly accelerates the model training when compared to relying solely on recomputation or swapping. Finally, DELTA offers users an automated training execution library, eliminating the need for manual configuration or specialized expertise. Experimental results demonstrate the effectiveness of DELTA in reducing GPU memory consumption. Compared to state-of-the-art methods, DELTA achieves substantial memory savings ranging from 40% to 72%, while maintaining comparable convergence performance for various models, including ResNet-50, ResNet-101, and BERT-Large. Notably, DELTA enables the training of GPT2-Large and GPT2-XL with batch sizes increased by 5.5 × and 6 ×, respectively, showcasing its versatility and practicality in enabling large-scale model training on GPU hardware.

Read full abstract

Dynamic memory managers are a crucial component of almost every modern software system. In addition to implementing efficient allocation and reclamation, memory managers provide the essential abstraction of memory as distinct objects, which underpins the properties of memory safety and type safety. Bugs in memory managers, while not common, are extremely hard to diagnose and fix. One reason is that their implementations often involve tricky pointer calculations, raw memory manipulation, and complex memory state invariants. While these properties are often documented, they are not specified in any precise, machine-checkable form. A second reason is that memory manager bugs can break the client application in bizarre ways that do not immediately implicate the memory manager at all. A third reason is that existing tools for debugging memory errors, such as Memcheck, cannot help because they rely on correct allocation and deallocation information to work. In this paper we present Permchecker, a tool designed specifically to detect and diagnose bugs in memory managers. The key idea in Permchecker is to make the expected structure of the heap explicit by associating typestates with each piece of memory. Typestate captures elements of both type (e.g., page, block, or cell) and state (e.g., allocated, free, or forwarded). Memory manager developers annotate their implementation with information about the expected typestates of memory and how heap operations change those typestates. At runtime, our system tracks the typestates and ensures that each memory access is consistent with the expected typestates. This technique detects errors quickly, before they corrupt the application or the memory manager itself, and it often provides accurate information about the reason for the error. The implementation of Permchecker uses a combination of compile-time annotation and instrumentation, and dynamic binary instrumentation (DBI). Because the overhead of DBI is fairly high, Permchecker is suitable for a testing and debugging setting and not for deployment. It works on a wide variety of existing systems, including explicit malloc/free memory managers and garbage collectors, such as those found in JikesRVM and OpenJDK. Since bugs in these systems are not numerous, we developed a testing methodology in which we automatically inject bugs into the code using bug patterns derived from real bugs. This technique allows us to test Permchecker on hundreds or thousands of buggy variants of the code. We find that Permchecker effectively detects and localizes errors in the vast majority of cases; without it, these bugs result in strange, incorrect behaviors usually long after the actual error occurs.

Read full abstract

Dynamic Memory Management Research Articles

Related Topics

Articles published on Dynamic Memory Management

DELTA: Memory-Efficient Training via Dynamic Fine-Grained Recomputation and Swapping

On the Performance Intricacies of Persistent Memory Aware Storage Engines

Parallel Training of Pre-Trained Models via Chunk-Based Dynamic Memory Management

Wfspan: Wait-free Dynamic Memory Management

DDrops: Detecting silent packet drops on programmable data plane

Dynamic Optimization of On-Chip Memories for HLS Targeting Many-Accelerator Platforms

Dynamic Memory Management in Massively Parallel Systems: A Case on GPUs.

An interactive and dynamic scratchpad memory management strategy for multi-core processors

VPipe: A Virtualized Acceleration System for Achieving Efficient and Scalable Pipeline Parallel DNN Training

A High-Speed NMS Coprocessor for Lightweight Ship Detection Algorithm

The Impact of Cache and Dynamic Memory Management in Static Dataflow Applications

TENSILE: A Tensor Granularity Dynamic GPU Memory Scheduling Method Toward Multiple Dynamic Workloads System

Permchecker: a toolchain for debugging memory managers with typestate

HMvisor: dynamic hybrid memory management for virtual machines

Randomized C/C++ dynamic memory allocator

Benchmarking and learning garbage collection delays for resource‐restricted graphical user interfaces

Low-energy motion estimation memory system with dynamic management.

Multilayer Mapping Kit for Autonomous UAV Navigation

DySHARQ: Dynamic Software-Defined Hardware-Managed Queues for Tile-Based Architectures

Dynamic Resource Allocation and Memory Management Using Machine Learning for Cloud Environments

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Dynamic Memory Management Research Articles

Related Topics

Articles published on Dynamic Memory Management

DELTA: Memory-Efficient Training via Dynamic Fine-Grained Recomputation and Swapping

On the Performance Intricacies of Persistent Memory Aware Storage Engines

Parallel Training of Pre-Trained Models via Chunk-Based Dynamic Memory Management

Wfspan: Wait-free Dynamic Memory Management

DDrops: Detecting silent packet drops on programmable data plane

Dynamic Optimization of On-Chip Memories for HLS Targeting Many-Accelerator Platforms

Dynamic Memory Management in Massively Parallel Systems: A Case on GPUs.

An interactive and dynamic scratchpad memory management strategy for multi-core processors

VPipe: A Virtualized Acceleration System for Achieving Efficient and Scalable Pipeline Parallel DNN Training

A High-Speed NMS Coprocessor for Lightweight Ship Detection Algorithm

The Impact of Cache and Dynamic Memory Management in Static Dataflow Applications

TENSILE: A Tensor Granularity Dynamic GPU Memory Scheduling Method Toward Multiple Dynamic Workloads System

Permchecker: a toolchain for debugging memory managers with typestate

HMvisor: dynamic hybrid memory management for virtual machines

Randomized C/C++ dynamic memory allocator

Benchmarking and learning garbage collection delays for resource‐restricted graphical user interfaces

Low-energy motion estimation memory system with dynamic management.

Multilayer Mapping Kit for Autonomous UAV Navigation

DySHARQ: Dynamic Software-Defined Hardware-Managed Queues for Tile-Based Architectures

Dynamic Resource Allocation and Memory Management Using Machine Learning for Cloud Environments