Abstract
It is crucial to optimize stencil computations since they are the core (and most computational demanding segment) of many Scientific Computing applications, therefore reducing overall execution time. This is not a simple task, actually it is lengthy and tedious. It is lengthy because the large number of stencil optimizations combinations to test, which might consume days of computing time, and the process is tedious due to the slightly different versions of code to implement. Alternatively, models that predict performance can be built without any actual stencil execution, thus reducing the cumbersome optimization task. Previous works have proposed cache misses and execution time models for specific stencil optimizations. Furthermore, most of them have been designed for 2D datasets or stencil sizes that only suit low order numerical schemes. We propose a flexible and accurate model for a wide range of stencil sizes up to high order schemes, that captures the behavior of 3D stencil computations using platform parameters. The model has been tested in a group of representative hardware architectures, using realistic dataset sizes. Our model predicts successfully stencil execution times and cache misses. However, predictions accuracy depends on the platform, for instance on x86 architectures prediction errors ranges between 1-20%. Therefore, the model is reliable and can help to speed up the stencil computation optimization process. To that end, other stencil optimization techniques can be added to this model, thus essentially providing a framework which covers most of the state-of-the-art.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have