Abstract

AbstractAstrophysical direct N-body methods have been one of the first production algorithms to be implemented using NVIDIA’s architecture. Now, almost seven years later, the GPU is the most used accelerator device in astronomy for simulating stellar systems. In this paper we present the implementation of the N-body library, which allows researchers to use the GPU for N-body simulations with little to no effort. The first version, released five years ago, is actively used, but lacks advanced features and versatility in numerical precision and support for higher order integrators. In this updated version we have rebuilt the code from scratch and added support for , multi-precision and higher order integrators. We show how to tune these codes for different GPU architectures and present how to continue utilizing the GPU optimal even when only a small number of particles ($N < 100$ N < 100 ) is integrated. This careful tuning allows to be faster than even with the added options and double precision data loads. The code runs on a range of NVIDIA and AMD GPUs in single and double precision accuracy. With the addition of support the library is also able to run on CPUs and other accelerators that support .

Highlights

  • The class of algorithms, commonly referred to as direct N body algorithms is still one of the most commonly used methods for simulations in astrophysics

  • In this paper we present our direct N -body library, Sapporo2, since we focus on the library we will not make a full comparison with the standalone software packages mentioned above

  • In general a Graphic Processing Units (GPUs) requires a large amount of these blocks to saturate the device in order to hide most of the latencies that originate from communication with the offchip memory

Read more

Summary

Introduction

The class of algorithms, commonly referred to as direct N body algorithms is still one of the most commonly used methods for simulations in astrophysics. These algorithms are computationally expensive as they scale as O(N ) This makes the method unsuitable for large N (> ), for these large N simulations one usually resorts to a lower precision method like the Barnes-Hut treecode method (Barnes and Hut ) or the Particle Mesh method that both scale as O(N log N) (e.g. Hohl and Hockney ; Hockney and Eastwood ). These methods, faster, are notably less accurate and not suitable for simulations that rely on the high accuracy that direct summation, coupled with higher order integrators, offer. On the other end of the spectrum you can find even

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.