Abstract

Based on the vectorised and cache optimised kernel, a parallel lower upper decomposition with a novel communication avoiding pivoting scheme is developed to solve dense complex matrix equations generated by the method of moments. The fine-grain data rearrangement and assembler instructions are adopted to reduce memory accessing times and improve CPU cache utilisation, which also facilitate vectorisation of the code. Through grouping processes in a binary tree, a parallel pivoting scheme is designed to optimise the communication pattern and thus reduces the solving time of the proposed solver. Two large electromagnetic radiation problems are solved on two supercomputers, respectively, and the numerical results demonstrate that the proposed method outperforms those in open source and commercial libraries.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call