Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system

Jakub Skrzypczak,Paweł Czarnul

doi:10.1016/j.simpat.2022.102691

Abstract

In the paper we present a modern efficient parallel OpenMP+CUDA implementation of crowd simulation for hybrid CPU+GPU systems and demonstrate its higher performance over CPU-only and GPU-only implementations for several problem sizes including 10 000, 50 000, 100 000, 500 000 and 1 000 000 agents. We show how performance varies for various tile sizes and what CPU–GPU load balancing settings shall be preferred for various domain sizes among CPUs and GPUs of a high performance system with 2 Intel Xeon Silver multicore CPUs and 8 NVIDIA Quadro RTX 5000 GPUs. We then present how execution time depends on the number of agents as well as the number of CUDA streams used for parallel execution of several CUDA kernels. We discuss the design and implementation of an algorithm with CPU computational threads, GPU management threads, assignment of particular tasks to threads as well as usage of pinned memory and CUDA shared memory for maximizing performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system

Abstract

Talk to us

Similar Papers

More From: Simulation Modelling Practice and Theory

Lead the way for us

Journal: Simulation Modelling Practice and Theory	Publication Date: Nov 24, 2022
Citations: 3

Similar Papers

SPARC: Accurate and efficient finite-difference formulation and parallel implementation of Density Functional Theory: Isolated clusters
Swarnava Ghosh ... Phanish Suryanarayana
Computer Physics Communications | VOL. 212
Swarnava Ghosh, et. al.Swarnava Ghosh ... Phanish Suryanarayana
19 Oct 2016
Computer Physics Communications | VOL. 212

Matrix-Matrix Multiplication Using Multiple GPUs Connected by Nvlink
Yea Rem Choi ... Vsevolod Nikolskiy
-
Yea Rem Choi, et. al.Yea Rem Choi ... Vsevolod Nikolskiy
17 Nov 2020
17 Nov 2020

Techniques for Designing Efficient Parallel Graph Algorithms for SMPs and Multicore Processors
Guojing Cong ... David A Bader
-
Guojing Cong, et. al.Guojing Cong ... David A Bader
01 Jan 2007
01 Jan 2007

Accelerated Molecular Mechanical and Solvation Energetics on Multicore CPUs and Manycore GPUs.
Deukhyun Cha ... Rezaul A Chowdhury
ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine | VOL. 2015
Deukhyun Cha, et. al.Deukhyun Cha ... Rezaul A Chowdhury
09 Sep 2015
09 Sep 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system

Abstract

Talk to us

Similar Papers

More From: Simulation Modelling Practice and Theory