Abstract

The matrix–matrix products for matrices of small size have continued to play an important part in a range of scientific applications. The heterogeneous architecture, which is predicted to be a trend in the exascale supercomputing era, gives rises to the challenges of porting and optimizing small matrix products. We present a method to accelerating and tune small matrix multiplications on Sunway TaihuLight supercomputer, which has been titled as the most powerful supercomputer four times in the Top5000 list. Sunway TaihuLight is equipped with Shen-Wei hybrid manycore processors. We use Nek5000 as a case study to demonstrate our methods. Nek5000 is an open-source computational fluid dynamics (CFD) solver based on the spectral element method (SEM) for incompressible flow. The high-order SEM method, of which the computation kernel is small dense matrix products, is regarded to have the potential to overcome constraints of standard CFD software. By optimizing using vectorization, we gained about 30% performance improvement on management processing element. We accelerated Nek5000 using computing processing elements (CPEs). The experiments results suggest that employing 32 CPEs delivers the best performance enhancements. We scaled Nek5000 to 16,384 core groups with 540,672 cores, reaching about 30% performance improvements.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call