Performance Study of Monte Carlo Codes on Xeon Phi Coprocessors — Testing MCNP 6.1 and Profiling ARCHER Geometry Module on the FS7ONNi Problem

Tianyu Liu,Christopher Carothers,Noah Wolfe,Hui Lin,Wei Ji,Peter Caracappa,X George Xu,Kris Zieb,F Malvagi,J.C Trama,C.M’B Diop,F Malouch,J Miss

doi:10.1051/epjconf/201715306022

Abstract

This paper contains two parts revolving around Monte Carlo transport simulation on Intel Many Integrated Core coprocessors (MIC, also known as Xeon Phi). (1) MCNP 6.1 was recompiled into multithreading (OpenMP) and multiprocessing (MPI) forms respectively without modification to the source code. The new codes were tested on a 60-core 5110P MIC. The test case was FS7ONNi, a radiation shielding problem used in MCNP’s verification and validation suite. It was observed that both codes became slower on the MIC than on a 6-core X5650 CPU, by a factor of ~4 for the MPI code and, abnormally, ~20 for the OpenMP code, and both exhibited limited capability of strong scaling. (2) We have recently added a Constructive Solid Geometry (CSG) module to our ARCHER code to provide better support for geometry modelling in radiation shielding simulation. The functions of this module are frequently called in the particle random walk process. To identify the performance bottleneck we developed a CSG proxy application and profiled the code using the geometry data from FS7ONNi. The profiling data showed that the code was primarily memory latency bound on the MIC. This study suggests that despite low initial porting e_ort, Monte Carlo codes do not naturally lend themselves to the MIC platform — just like to the GPUs, and that the memory latency problem needs to be addressed in order to achieve decent performance gain.

Highlights

This study suggests that despite low initial porting effort, Monte Carlo codes do not naturally lend themselves to the Many Integrated Core coprocessors (MICs) platform — just like to the Graphics Processing Units (GPUs), and that the memory latency problem needs to be addressed in order to achieve decent performance gain
In recent years hardware acceleration using Many Integrated Core coprocessors (MICs) made by Intel or Graphics Processing Units (GPUs) by Nvidia has become increasingly common in scientific computing
Examples include the development of “CUDA runtime Application Programming Interface (API)” built on the original low-level driver API, which significantly reduces the amount of boilerplate codes and improves readability, “unified memory” which carries the burden of memory management to some extent by eliminating the need for explicit data copy

Summary

Introduction

In recent years hardware acceleration using Many Integrated Core coprocessors (MICs) made by Intel or Graphics Processing Units (GPUs) by Nvidia has become increasingly common in scientific computing. Two specific questions from developers are: 1 how hard is it to port existing codes to accelerators, how good is the performance and what is the bottleneck, 2 how hard is it to perform acceleratorspecific optimization?. The MICs (Knights Corner generation) and GPUs (Kepler and Maxwell generations) are not binary compatible with the CPUs, which means existing programs cannot directly run on accelerators. For GPUs, the codes need to be rewritten in Nvidia’s GPU-specific Application Programming Interface (API) called CUDA [3]. Alternative languages do exist, such as the compiler directive based OpenACC [4] and new version of OpenMP (> 4.0) [5], to facilitate code porting at the cost of less functionality and lower performance than CUDA

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Performance Study of Monte Carlo Codes on Xeon Phi Coprocessors — Testing MCNP 6.1 and Profiling ARCHER Geometry Module on the FS7ONNi Problem

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences

Lead the way for us

Journal: EPJ Web of Conferences	Publication Date: Jan 1, 2017
License type: cc-by

Similar Papers

A server-side accelerator framework for multi-core CPUs and Intel Xeon Phi co-processor systems
Guohua You ... Xuejing Wang
Cluster Computing | VOL. 23
Guohua You, et. al.Guohua You ... Xuejing Wang
01 Jan 2020
Cluster Computing | VOL. 23

The Modeling for Numerical Simulation of a Benchmark Model Based on CAD System
Minle Xu ... Yuanguang Fu
-
Minle Xu, et. al.Minle Xu ... Yuanguang Fu
07 Jul 2018
07 Jul 2018

Kalman Filter Tracking on Parallel Architectures
Giuseppe Cerati ... Dan Riley
Journal of Physics: Conference Series | VOL. 664
Giuseppe Cerati, et. al.Giuseppe Cerati ... Dan Riley
01 Dec 2015
Journal of Physics: Conference Series | VOL. 664

MICA: A fast short-read aligner that takes full advantage of Many Integrated Core Architecture (MIC).
Ruibang Luo ... Xiaoqian Zhu
BMC Bioinformatics | VOL. Suppl 16 7
Ruibang Luo, et. al.Ruibang Luo ... Xiaoqian Zhu
23 Apr 2015
BMC Bioinformatics | VOL. Suppl 16 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Performance Study of Monte Carlo Codes on Xeon Phi Coprocessors — Testing MCNP 6.1 and Profiling ARCHER Geometry Module on the FS7ONNi Problem

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences