Lattice gauge theory on a multi-core processor, Cell/B.E

Shinji Motoki,Atsushi Nakamura

doi:10.1016/j.procs.2011.04.091

Shinji Motoki, Atsushi Nakamura

Open Access

https://doi.org/10.1016/j.procs.2011.04.091

Copy DOI

Journal: Procedia Computer Science	Publication Date: Jan 1, 2011
Citations: 3	License type: cc-by-nc-nd

Affiliation: Hiroshima University

Abstract

Abstract We report our implementation experience of a lattice gauge theory code on the Cell Broadband Engine, which is a new heterogeneous multi-core processor. As a typical operation, we take a SU(3) matrix multiplication which is one of the most important parts of lattice gauge theories. Employing full advantage of the Cell/B.E. including SIMD operations and many registers, which enable the full use of the arithmetic units through the loop-unrolling, we obtain about 200 GFLOPS with 16 SPE, which corresponds around 80% of the theoretical peak. To our knowledge, this is the fastest value of this operation obtained on the Cell/B.E. so far. However, when we measure the whole time including the data supply, the speed drops down to about 13 GFLOPS.We found that the bandwidth of the data transfer between the main memory and EIB, 25 GB/s, is a bottleneck. In other words, it is possible to run the arithmetic units on the Cell/B.E. with 200 GFLOPS speed, but the current socket structure of Cell/B.E. prevents it. We discuss several techniques to improve the problem partially by reducing the transferred data.

Full Text