Abstract

The population annealing method is a promising approach for large-scale simulations because it is potentially scalable on any parallel architecture. We present an implementation of the algorithm on a hybrid program architecture combining CUDA and MPI. The problem is to keep all general-purpose graphics processing unit devices as busy as possible by efficiently redistributing replicas. We provide details of testing on hardware based the Intel Skylake/Nvidia V100 running more than two million replicas of the Ising model sample in parallel. The results are quite encouraging because the acceleration grows toward the perfect line as the complexity of the simulated system increases.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call