Abstract

In this paper, a full realization of the higher order method of moments (HMoM) with a parallel out-of-core LU solver on GPU/CPU platform is presented in detail, mainly including three parts: In the first part, both global-auxiliary table and local-auxiliary table are introduced for reducing a lot of tedious and repetitive calculations, and then a realization for GPU-oriented programming is proposed and optimized. In the second part, an overlapped grouping of all the curved quadrilaterals is proposed. With this scheme, all the submatrices can be efficiently generated one by one without wasting any calculations with the help of both the video memory and the host memory. In the third part, a GPU-based out-of-core algorithm for LU decomposition is proposed and further developed into a hybrid GPU/CPU algorithm. Numerical examples are provided to test the robustness of the proposed algorithm by comparison with the measurement and/or the traditional MoM with RWG basis functions, and to demonstrate the overall performance of the proposed algorithm by comparison with the existing algorithm for dealing with similar problems. The speedup ratio of the proposed algorithm for generating the HMoM matrix can achieve about from 7 to 12 compared with the GPU-based algorithm in literatures. Also compared with the 8-threaded CPU-based algorithm, the speedup ratio of the proposed algorithm for LU decomposition can exceed 13 for the single precision case and 7 for the double precision case.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call