Abstract
Lattice Boltzmann (LB) methods are widely used today to describe the dynamics of fluids. Key advantages of this approach are the relative ease with which complex physics behavior, e.g. associated to multi-phase flows or irregular boundary conditions can be modeled, and - from a computational perspective - the large degree of available parallelism, that can be easily exploited on massively parallel systems. The advent of multi-core and many-core processors, including General Purpose Graphics Processing Unit (GP-GPU), has pushed the quest for parallelization also at the intra-processor level. From this point of view, LB methods may strongly benefit from these new architectures. In this paper we describe the implementation and optimization of a recently proposed thermal LB model - the so called D2Q37 model - on multi-GPU systems. We describe in details the optimization techniques that we have used at both the intra-processor and inter-processor level, present performance and scaling figures and analyze bottlenecks associated to this implementation.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.