Manycore challenge in particle-in-cell simulation: How to exploit 1 TFlops peak performance for simulation codes with irregular computation

Hiroshi Nakashima

doi:10.1016/j.compeleceng.2015.03.010

Abstract

This paper discusses the challenge in post-Peta and Exascale era especially that brought by manycore processors of ordinary (i.e., non-GPU type) CPU cores. Though such a processor like Intel Xeon Phi gives us TFlops-class computational power and may lead us to Exascale computing, full exploitation of its potential is far from an easy job due to its source of high performance, namely a large scale multithreading and a wide SIMD mechanism. In fact, in the three-tier parallelism namely inter-node, intra-node and intra-core ones, we found their order does not represent the toughness in HPC programming but the order should be reversed to do that. Our case study with a particle-in-cell plasma simulation code supports our observation revealing that a simple porting of an existing code to Xeon Phi is infeasible from the viewpoint of performance and we have to make a significant change of the code structure so that it conforms with the features of the processor. However the study also confirms that the recoding effort is well rewarded achieving a good single-node performance higher than that obtained from an execution on four dual-socket nodes of Cray XE6.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Manycore challenge in particle-in-cell simulation: How to exploit 1 TFlops peak performance for simulation codes with irregular computation

Abstract

Talk to us

Similar Papers

More From: Computers & Electrical Engineering

Lead the way for us

Journal: Computers & Electrical Engineering	Publication Date: Apr 7, 2015
Citations: 12

Similar Papers

A server-side accelerator framework for multi-core CPUs and Intel Xeon Phi co-processor systems
Guohua You ... Xuejing Wang
Cluster Computing | VOL. 23
Guohua You, et. al.Guohua You ... Xuejing Wang
01 Jan 2020
Cluster Computing | VOL. 23

Scalability of 3D deterministic particle transport on the Intel MIC architecture
...
-
, et. al. ...
16 Apr 2019
16 Apr 2019

Kalman Filter Tracking on Parallel Architectures
Giuseppe Cerati ... Matevž Tadel
Journal of Physics: Conference Series | VOL. 664
Giuseppe Cerati, et. al.Giuseppe Cerati ... Matevž Tadel
01 Dec 2015
Journal of Physics: Conference Series | VOL. 664

Intel® Xeon Phi™ Coprocessor Architecture and Tools
Rezaur Rahman
-
Rezaur RahmanRezaur Rahman
01 Jan 2013
01 Jan 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Manycore challenge in particle-in-cell simulation: How to exploit 1 TFlops peak performance for simulation codes with irregular computation

Abstract

Talk to us

Similar Papers

More From: Computers &amp; Electrical Engineering

More From: Computers & Electrical Engineering