Dynamic Buffer Management in Massively Parallel Systems: The Power of Randomness

Minh Pham,Yongke Yuan,Hao Li,Chengcheng Mou,Yicheng Tu,Zichen Xu,Jinghan Meng

doi:10.1145/3701623

Abstract

Massively parallel systems, such as Graphics Processing Units (GPUs), play an increasingly crucial role in today’s data-intensive computing. The unique challenges associated with developing system software for massively parallel hardware to support numerous parallel threads efficiently are of paramount importance. One such challenge is the design of a dynamic memory allocator to allocate memory at runtime. Traditionally, memory allocators have relied on maintaining a global data structure, such as a queue of free pages. However, in the context of massively parallel systems, accessing such global data structures can quickly become a bottleneck even with multiple queues in place. This paper presents a novel approach to dynamic memory allocation that eliminates the need for a centralized data structure. Our proposed approach revolves around letting threads employ random search procedures to locate free pages. Through mathematical proofs and extensive experiments, we demonstrate that the basic random search design achieves lower latency than the best-known existing solution, Ouroboros, in most situations. Furthermore, we develop more advanced techniques and algorithms to tackle the challenge of warp divergence and further enhance performance when free memory is limited. Building upon these advancements, our mathematical proofs and experimental results affirm that these advanced designs can yield an order of magnitude improvement over the basic design and consistently outperform the state-of-the-art by up to two orders of magnitude. To illustrate the practical implications of our work, we integrate our memory management techniques into two GPU algorithms: a hash join and a group-by. Both case studies provide compelling evidence of our approach’s pronounced performance gains.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Dynamic Buffer Management in Massively Parallel Systems: The Power of Randomness

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Parallel Computing

Lead the way for us

Similar Papers

GPU-DAEMON: GPU algorithm design, data management & optimization template for array based big omics data
Muaaz Gul Awan ... Fahad Saeed
Computers in Biology and Medicine | VOL. 101
Muaaz Gul Awan, et. al.Muaaz Gul Awan ... Fahad Saeed
16 Aug 2018
Computers in Biology and Medicine | VOL. 101

Smart dynamic memory allocator for embedded systems
M Ramakrishna ... Youngki Chung
-
M Ramakrishna, et. al.M Ramakrishna ... Youngki Chung
01 Oct 2008
01 Oct 2008

Ouroboros
Martin Winter ... Mathias Parger
-
Martin Winter, et. al.Martin Winter ... Mathias Parger
29 Jun 2020
29 Jun 2020

Dynamic Memory Management in Massively Parallel Systems: A Case on GPUs.
Minh Pham ... Yicheng Tu
ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing | VOL. 2022
Minh Pham, et. al.Minh Pham ... Yicheng Tu
28 Jun 2022
ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing | VOL. 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dynamic Buffer Management in Massively Parallel Systems: The Power of Randomness

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Parallel Computing