Abstract

A communication-avoiding generalized minimal residual (CA-GMRES) method is applied to the gyrokinetic toroidal five dimensional Eulerian code GT5D, and its performance is compared against the original code with a generalized conjugate residual (GCR) method on the JAEA ICEX (Haswell), the Plasma Simulator (FX100), and the Oakforest-PACS (KNL). Although the CA-GMRES method dramatically reduces the number of data reduction communications, computation is largely increased compared with the GCR method. To resolve this issue, we propose a modified CA-GMRES method, which reduces both computation and memory access by ~ 30% with keeping the same CA property as the original CA-GMRES method. The modified CA-GMRES method has ~ 3.8X higher arithmetic intensity than the GCR method, and thus, is suitable for future Exa-scale architectures with limited memory and network bandwidths. The CA-GMRES solver is implemented using a hybrid CA approach, in which we apply CA to data reduction communications and use communication overlap for halo data communications, and is highly optimized for distributed caches on KNL. It is shown that compared with the GCR solver, its computing kernels are accelerated by 1.47X ~ 2.39X, and the cost of data reduction communication is reduced from 5% ~ 13% to ~ 1% of the total cost at 1,280 nodes.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.