Abstract
XSBench is a proxy application used to study the performance of nuclear macroscopic cross-section data construction, which is usually the most time-consuming process in Monte Carlo neutron transport simulations. In this technical note we report on our experience in optimizing XSBench to Intel multicore central processing units (CPUs), many integrated core coprocessors (MICs), and Nvidia graphics processing units (GPUs). The continuous-energy cross-section construction in the Monte Carlo simulation of the Hoogenboom-Martin large problem is used in our benchmark. We demonstrate that through several tuning techniques, particularly data prefetch, the performance of XSBench on each platform can be desirably improved compared to the original implementation on the same platform. It is shown that the performance gain is 1.46× on the Westmere CPU, 1.51× on the Haswell CPU, 2.25× on the Knights Corner (KNC) MIC, and 5.98× on the Kepler GPU. The comparison across different platforms shows that when using the high-end Haswell CPU as the baseline, the KNC MIC is 1.63× faster while the high-end Kepler GPU is 2.20× faster.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have