Abstract

In this work an ultra-scalable algorithm is designed and optimized to accelerate a 3D compressible Euler atmospheric model on the CPU-MIC hybrid system of Tianhe-2. We first reformulate the mesocale model to avoid long-latency operations, and then employ carefully designed inter-node and intra-node domain decomposition algorithms to achieve balance utilization of different computing units. Proper communication-computation overlap and concurrent data transfer methods are utilized to reduce the cost of data movement at scale. A variety of optimization techniques on both the CPU side and the accelerator side are exploited to enhance the in-socket performance. The proposed hybrid algorithm successfully scales to 6,144 Tianhe-2 nodes with a nearly ideal weak scaling efficiency, and achieve over 8 percent of the peak performance in double precision. This ultra-scalable hybrid algorithm may be of interest to the community to accelerating atmospheric models on increasingly dominated heterogeneous supercomputers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call