Efficient SIMDization and Data Management of the Lattice QCD Computation on the Cell Broadband Engine

Khaled Z Ibrahim,François Bodin

doi:10.1155/2009/634756

Abstract

Lattice Quantum Chromodynamic (QCD) models subatomic interactions based on a four-dimensional discretized space–time continuum. The Lattice QCD computation is one of the grand challenges in physics especially when modeling a lattice with small spacing. In this work, we study the implementation of the main kernel routine of Lattice QCD that dominates the execution time on the Cell Broadband Engine. We tackle the problem of efficient SIMD execution and the problem of limited bandwidth for data transfers with the off-chip memory. For efficient SIMD execution, we present runtime data fusion technique that groups data processed similarly at runtime. We also introduce analysis needed to reduce the pressure on the scarce memory bandwidth that limits the performance of this computation. We studied two implementations for the main kernel routine that exhibit different patterns of accessing the memory and thus allowing different sets of optimizations. We show the attributes that make one implementation more favorable in terms of performance. For lattice size that is significantly larger than the local store, our implementation achieves 31.2 GFlops for single precision computations and 16.6 GFlops for double precision computations on the PowerXCell 8i, an order of magnitude better than the performance achieved on most general-purpose processors.

Highlights

Simulating Lattice Quantum Chromodynamic (QCD) aims at understanding the strong interactions that bind sub-nuclear matter together to form stable nuclear matter [19]
We introduce an implementation of the main kernel routine for simulating Lattice QCD
We investigated the tradeoffs affecting the efficiency of these implementations to the Cell Broadband Engine (BE) both for code SIMDization and for managing direct memory transfers

Summary

Introduction

Simulating Lattice Quantum Chromodynamic (QCD) aims at understanding the strong interactions that bind sub-nuclear matter (quarks and gluons) together to form stable nuclear matter (hadrons) [19]. Efficient implementation of a main kernel routine, responsible for computing the actions of Wilson– Dirac operator, is of critical importance for the simulation of Lattice Quantum Chromodynamics (Lattice QCD) [4,6,19]. We introduce an implementation of the main kernel routine for simulating Lattice QCD. In this implementation, we try to provide answers to two main questions; the first question is how to SIMDize the computation in an efficient way; the second question is how to distribute the lattice data and how to handle memory efficiently.

Cell Broadband Engine and its software development environment

Lattice QCD main kernel routine

Computation models for the Wilson–Dirac kernel routine

SIMDizing the main kernel computations on the Cell Broadband Engine

Runtime data fusion

Lattice QCD memory management

Contiguity analysis of the data space

Performance with DMA

Performance scaling of the introduced implementation

SPEs utilization

Scaling of the proposed scheme on a large scale system

Findings

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific programming	Publication Date: Jan 1, 2009
Citations: 18	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

Efficient SIMDization and Data Management of the Lattice QCD Computation on the Cell Broadband Engine

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific programming

Lead the way for us

Similar Papers

QPACE: Quantum Chromodynamics Parallel Computing on the Cell Broadband Engine
...
Computing in Science & Engineering | VOL. 10
, et. al. ...
01 Nov 2008
Computing in Science & Engineering | VOL. 10

AN MPI PERFORMANCE MONITORING INTERFACE FOR CELL BASED COMPUTE NODES
Hikmet Dursun ... Rajiv K Kalia
Parallel Processing Letters | VOL. 19
Hikmet Dursun, et. al.Hikmet Dursun ... Rajiv K Kalia
01 Dec 2009
Parallel Processing Letters | VOL. 19

Parallel implementations of Brunotte’s algorithm
Antal Tátrai
Journal of Parallel and Distributed Computing | VOL. 71
Antal TátraiAntal Tátrai
23 Dec 2010
Journal of Parallel and Distributed Computing | VOL. 71

Implementing Wilson-Dirac operator on the cell broadband engine
Khaled Z Ibrahim ... Francois Bodin
-
Khaled Z Ibrahim, et. al.Khaled Z Ibrahim ... Francois Bodin
07 Jun 2008
07 Jun 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient SIMDization and Data Management of the Lattice QCD Computation on the Cell Broadband Engine

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific programming