Abstract
This paper discusses the challenge in post-Peta and Exascale era especially that brought by manycore processors of ordinary (i.e., non-GPU type) CPU cores. Though such a processor like Intel Xeon Phi gives us TFlops-class computational power and may lead us to Exascale computing, full exploitation of its potential is far from an easy job due to its source of high performance, namely a large scale multithreading and a wide SIMD mechanism. In fact, in the three-tier parallelism namely inter-node, intra-node and intra-core ones, we found their order does not represent the toughness in HPC programming but the order should be reversed to do that. Our case study with a particle-in-cell plasma simulation code supports our observation revealing that a simple porting of an existing code to Xeon Phi is infeasible from the viewpoint of performance and we have to make a significant change of the code structure so that it conforms with the features of the processor. However the study also confirms that the recoding effort is well rewarded achieving a good single-node performance higher than that obtained from an execution on four dual-socket nodes of Cray XE6.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.