Abstract
Abstract. The current large-scale parallel barrier of ocean general circulation models (OGCMs) makes it difficult to meet the computing demand of high resolution. Fully considering both the computational characteristics of OGCMs and the heterogeneous many-core architecture of the new Sunway supercomputer, swNEMO_v4.0, based on NEMO4 (Nucleus for European Modelling of the Ocean version 4), is developed with ultrahigh scalability. Three innovations and breakthroughs are shown in our work: (1) a highly adaptive, efficient four-level parallelization framework for OGCMs is proposed to release a new level of parallelism along the compute-dependency column dimension. (2) A many-core optimization method using blocking by remote memory access (RMA) and a dynamic cache scheduling strategy is applied, effectively utilizing the temporal and spatial locality of data. The test shows that the actual direct memory access (DMA) bandwidth is greater than 90 % of the ideal bandwidth after optimization, and the maximum is up to 95 %. (3) A mixed-precision optimization method with half, single and double precision is explored, which can effectively improve the computation performance while maintaining the simulated accuracy of OGCMs. The results demonstrate that swNEMO_v4.0 has ultrahigh scalability, achieving up to 99.29 % parallel efficiency with a resolution of 500 m using 27 988 480 cores, reaching the peak performance with 1.97 PFLOPS.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.