Abstract

The past decade has produced numerous CPU architectural innovations. These have included multiple cores per CPU, multiple simultaneous threads per core, and, especially with GPUs, highly complex memory hierarchies. As a result, performance porta-bility has become a major challenge to programmers. We identify the SIMD engines in modern CPU and GPU cores as the key to obtaining high performance for scientific application codes. This common element of all present computing devices makes performance portability possible. However, we find that achieving this performance requires us to express the code in terms of intrinsic functions for the SIMD engine instructions, and these functions are different for each device. To assist the programmer in creating the necessary code expressions for each vendor's compilers, we have built an automated code translator that takes as input a single Fortran source code, written in a special style and annotated with directives, and creates output code for each device and compiler combination. The manual translations for GPU permit us here to evaluate the performance that our code transformations deliver on these devices. We present a performance study using our single-fluid PPM gas dynamics code and covering the latest multi-core processors and the Nvidia GPU.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call