Abstract

In this paper we show how to efficiently implement parallel discrete simulations on multicore and GPU architectures through a real example of an application: a cellular automata model of laser dynamics. We describe the techniques employed to build and optimize the implementations using OpenMP and CUDA frameworks. We have evaluated the performance on two different hardware platforms that represent different target market segments: high-end platforms for scientific computing, using an Intel Xeon Platinum 8259CL server with 48 cores, and also an NVIDIA Tesla V100 GPU, both running on Amazon Web Server (AWS) Cloud; and on a consumer-oriented platform, using an Intel Core i9 9900k CPU and an NVIDIA GeForce GTX 1050 TI GPU. Performance results were compared and analyzed in detail. We show that excellent performance and scalability can be obtained in both platforms, and we extract some important issues that imply a performance degradation for them. We also found that current multicore CPUs with large core numbers can bring a performance very near to that of GPUs, and even identical in some cases.

Highlights

  • Discrete simulation methods encompass a family of modeling techniques which employ entities that inhabit discrete states and evolve in discrete time steps

  • We present parallel implementations for multicore CPUs and for graphics processing units (GPUs) of the cellular automaton model of laser dynamics introduced by Guisado et al [16,17,18]

  • We evaluated the performance on a high-performance server CPU running in the Cloud, using the Amazon Web Server (AWS) Infrastructure as a Service (IaaS) EC2 service

Read more

Summary

Introduction

Discrete simulation methods encompass a family of modeling techniques which employ entities that inhabit discrete states and evolve in discrete time steps. Boltzmann method (LBM), and discretizations of continuos models, such as many stencil-based partial differential equation (PDE) solvers and particle methods based on fixed neighbor lists They are powerful tools that have been widely used to simulate complex systems of very different kinds (in which a global behavior results from the collective action of many simple components that interact locally) and to solve systems of differential equations. Efficient parallel implementations of this kind of discrete simulation are extremely important This type of discrete algorithm has a strong parallel nature, because each is composed of many individual components or cells that are simultaneously updated. They have a local nature, since the evolution of cells is determined by strictly local rules; i.e., each cell only interacts with a low Electronics 2020, 9, 189; doi:10.3390/electronics9010189 www.mdpi.com/journal/electronics

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call