Abstract

Since its inception, the field of computational fluid dynamics (CFD) has relied on the Navier-Stokes (NS) equations to govern the simulation of fluid flow. By solving the NS equations at all scales, accurate simulations are possible, albeit at high costs in computational time and power. The exorbitant cost of accuracy is often circumvented through numerical models, approximating some, if not all, scales. Technological advancements have also helped reduce simulation costs, but their impact remains limited as they focus on parallel processing, an approach incompatible with the implicit NS equations. In response to these shortcomings, the Lattice-Boltzmann method (LBM) emerged as an alternative approach for directly solving NS equations. Rooted in kinetic theory, LBM describes the behavior of fluids at a macroscopic level by averaging the fluid particles' interactions at a microscopic level. Thus, by following a meso scopic approach, LBM recovers the NS equations at a cost traditionally matched only by methods employing approximate models. Furthermore, the discrete nature, high locality, and explicit computational scheme of LBM make it an ideal target for parallel processors. Paired with a powerful acceleration hardware, LBM holds the promise of taking CFD simulations to the next level, reaching simulation speeds beyond what Navier-Stokes could achieve. In this paper, we explore the limitations of today's hardware accelerators, which prevent LBM from realizing its full potential. We provide an overview of successful LBM implementations on general-purpose graphics processing units (GPGPU) and field-programmable gate arrays (FPGA), and argue that despite their previous successes, neither platform is likely to provide performance improvements beyond what has already been achieved. As such, we conduct a preliminary study to evaluate the feasibility of an application-specific integrated circuits (ASIC) accelerator, specifically designed to expedite LBM-based simulations. Focusing on the general Lattice-Boltzmann method, the chip design implements the collision and streaming steps, together with a periodic boundary condition. Using the findings of our study, we compare the relative performance and merits of LBM implementations on GPUs, FPGAs, and ASICs, all based on 16-nm technology. Preliminary conclusions indicate that an ASIC LBM accelerator may deliver performance improvements of 33×, relative to GPU devices.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call