Abstract

This paper focuses on the performance aspects of two types of many-core architectures, Tesla K20c GPU and Xeon Phi 31SP coprocessor, in solving complicated flow structures with a high-order weighted essentially non-oscillatory (WENO) scheme. Some performance optimization techniques are also detailed discussed. The results show that the K20c GPU can run several times faster than Xeon Phi 31SP due to the under-utilization of the Vector Processing Units (VPUs). When its VPUs are fully utilized, the Xeon Phi 31SP can achieve equivalent performance to that of the K20c GPU. The results could serve as a case study for users to select the right many-core architectures for their targeted application.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call