On the performance improvement of a parallel 3-D PIC-FEM code

J.-S Wu,K.-H Hsu,C.-T Hung

doi:10.1109/plasma.2006.1706895

Abstract

Summary form only given. In the previous ICOPS meeting, we have presented a parallel 3-D PIC code using the finite-element method with an unstructured tetrahedral mesh for the flexibility of modeling objects with complex geometry. In addition, the dynamic domain decomposition using the graph-partitioning technique is employed for a better load balancing among the processors. Parallel efficiency of this code, implemented on HP clusters could be as high as 82% with 32 processors (40 particles per cell, ~30,000 nodes). However, one of the major drawbacks of this code is the relatively poor runtime performance as compared to the previous PIC codes using the finite-difference method with a structured mesh. In this paper, we will present some improvements, including the Poisson's equation solver and the particle tracing technique, to greatly enhance the code performance. First, we have replaced the original parallel conjugate gradient method by either a sparse direct matrix solver (MUMPS) for fewer processors ( 10). With the MUMPS for fewer processors, the assembled coefficient matrix is factorized into the L and U matrices once initially and they are stored for further use at each time step. At each time, only the source term (charge density) changes while the L and U matrices remain unchanged, which makes solving the matrix equation very fast. Second, a tetrahedral mesh is replaced by a multi-block hybrid structured-unstructured mesh to both maintain the flexibility of dealing with complicated geometry and the maximal efficiency of particle tracing. In the structured-mesh block with the pure hexahedral cells, the particle tracing takes advantage of the simple relation between mesh coordinate and mesh index, which is very fast. While in the unstructured-mesh block with the mixed tetrahedral and pyramid cells, the similar technique is adopted. A RF capacitive discharge between two circular electrodes in a hexahedral metal chamber is used to demonstrate the performance improvement and preliminary results show reduction of the runtime up to five times can be achieved in the test example

Full Text