Abstract

This research further extends an existing state-of-the-art GPU framework of SPH simulation for better performance without any compromise of numerical accuracy and sacrifice of simulation detail. Towards this grand goal, we devise three new strategies. First, we design a novel hierarchical grid to decompose the simulation space, where we can locate particles in a more refined unit space. With this organization, particle distribution coherence in global memory is improved. As a result, global memory operations can be more efficient. Second, based on our well designed hierarchical grid, we propose a hierarchical neighbor search strategy for catering to the heterogeneous distribution of particles in cells, so we can search a neighbor particle in different-level grids. In a cell with dense particles, we can search particles in a higher-level grid whose unit space is more refined, which means we can narrow the search space for decreasing the access of false neighbor particles. In contrast, in a cell with sparse particles, we search particles in a lower-level grid whose unit space is relatively large, which means we can decrease the loop iterations when we search the entire neighbor cell. In this way, we can avoid the unnecessary overhead of loop iterations. After designing a reasonable neighbor search strategy, an efficient thread cooperation strategy can further improve the performance of our framework by realizing more potentials of GPGPU. The existing state-of-the-art method is concentrating on task assignment strategy for taking the full advantage of shared memory. The method decomposes particles in a cell into several tasks and then assigns a cooperation thread array (CTA) to each task. However, this method has not fully considered the uniformity of tasks belonging to the same cell, as these tasks always share a lot of particles with the same neighborhood. Finally, we propose a hierarchical task assignment strategy by merging such successive tasks into a larger task, which means we only load the same neighbor particles once and the corresponding CTAs are arranged to work together to handle the new task. Our method can greatly reduce the overload of neighbor particles and the overhead of neighbor search iterations. Through our comprehensive tests validated in practice, our work can exhibit 1.73× speedup when compared with other state-of-the-art GPGPU frameworks for SPH simulation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call