Abstract
GPUs deliver higher performance than traditional processors, offering remarkable energy efficiency, and are quickly becoming very popular processors for HPC applications. Still, writing efficient and scalable programs for GPUs is not an easy task as codes must adapt to increasingly parallel architecture features. In this chapter, the authors describe in full detail design and implementation strategies for lattice Boltzmann (LB) codes able to meet these goals. Most of the discussion uses a state-of-the art thermal lattice Boltzmann method in 2D, but all lessons learned in this particular case can be immediately extended to most LB and other scientific applications. The authors describe the structure of the code, discussing in detail several key design choices that were guided by theoretical models of performance and experimental benchmarks, having in mind both single-GPU codes and massively parallel implementations on commodity clusters of GPUs. The authors then present and analyze performances on several recent GPU architectures, including data on energy optimization.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.