Abstract

SummaryWe present initial comparison performance results for Intel many integrated core (MIC), Sandy Bridge (SB), and graphical processing unit (GPU). A 1D explicit electrostatic particle‐in‐cell code is used to simulate a two‐stream instability in plasma. We compare the computation times for various number of cores/threads and compiler options. The parallelization is implemented via OpenMP with a maximum thread number of 128. Parallelization and vectorization on the GPU is achieved with modifying the code syntax for compatibility with CUDA. We assess the speedup due to various auto‐vectorization and optimization level compiler options. Our results show that the MIC is several times slower than SB for a single thread, and it becomes faster than SB when the number of cores increases with vectorization switched on. The compute times for the GPU are consistently about six to seven times faster than the ones for MIC. Compared with SB, the GPU is about two times faster for a single thread and about an order of magnitude faster for 128 threads. The net speedup, however, for MIC and GPU are almost the same. An initial attempt to offload parts of the code to the MIC coprocessor shows that there is an optimal number of threads where the speedup reaches a maximum. Copyright © 2014 John Wiley & Sons, Ltd.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.