A VLSI approach to cache memory: Threewit, BComput. Des. Vol 21 No 1 (January 1982) pp 169–172

doi:10.1016/0141-9331(82)90072-2

Abstract

This paper describes a massively parallel code for a state-of-the art thermal lattice–Boltzmann method. Our code has been carefully optimized for performance on one GPU and to have a good scaling behavior extending to a large number of GPUs. Versions of this code have been already used for large-scale studies of convective turbulence.GPUs are becoming increasingly popular in HPC applications, as they are able to deliver higher performance than traditional processors. Writing efficient programs for large clusters is not an easy task as codes must adapt to increasingly parallel architectures, and the overheads of node-to-node communications must be properly handled.We describe the structure of our code, discussing several key design choices that were guided by theoretical models of performance and experimental benchmarks. We present an extensive set of performance measurements and identify the corresponding main bottlenecks; finally we compare the results of our GPU code with those measured on other currently available high performance processors. Our results are a production-grade code able to deliver a sustained performance of several tens of Tflops as well as a design and optimization methodology that can be used for the development of other high performance applications for computational physics.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A VLSI approach to cache memory: Threewit, BComput. Des. Vol 21 No 1 (January 1982) pp 169–172

Abstract

Talk to us

Similar Papers

More From: Microprocessors and Microsystems

Lead the way for us

Similar Papers

Massively parallel lattice–Boltzmann codes on large GPU clusters
E Calore ... E Pellegrini
Parallel Computing | VOL. 58
E Calore, et. al.E Calore ... E Pellegrini
01 Oct 2016
Parallel Computing | VOL. 58

Design and Optimizations of Lattice Boltzmann Methods for Massively Parallel GPU-Based Clusters
Enrico Calore ... Raffaele Tripiccione
-
Enrico Calore, et. al.Enrico Calore ... Raffaele Tripiccione
01 Jan 2018
01 Jan 2018

Progress in a novel architecture for high performance processing
Zhiwei Zhang ... Meng Liu
Japanese Journal of Applied Physics | VOL. 57
Zhiwei Zhang, et. al.Zhiwei Zhang ... Meng Liu
13 Mar 2018
Japanese Journal of Applied Physics | VOL. 57

Exploring high-performance processor architecture beyond the exascale
Xiang-Hui Xie ... Xun Jia
Frontiers of Information Technology & Electronic Engineering | VOL. 19
Xiang-Hui Xie, et. al.Xiang-Hui Xie ... Xun Jia
01 Oct 2018
Frontiers of Information Technology & Electronic Engineering | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A VLSI approach to cache memory: Threewit, BComput. Des. Vol 21 No 1 (January 1982) pp 169–172

Abstract

Talk to us

Similar Papers

More From: Microprocessors and Microsystems