Abstract

With the advent of GPU computing, profiling tools are now widely used to assist developers in identifying and solving performance bottlenecks. Those tools are commonly relying on hardware performance counters to grant users the access to low-level activities. Considering the increasing complexity of modern GPUs and also the efforts needed to associate program behaviors with hardware events, it is not trivial to construct the profiling tools with assured correctness. As a result, profiling tools should be strictly validated to make sure that the program behaviors and resource usages can be correctly captured and analyzed. To aid the validation, we create a testing prototype DELTA, on top of the open-source Radeon Open Compute platform (ROCm), to investigate the values of the derived profiling metrics and their underlying basic counters. The tests of DELTA are generally based on the classical microbenchmarks, which are capable to control program behaviors to generate predictable statistics. Differing from prior dissecting works, our tests are to examine the profiled results and to compare against desired patterns, reporting whether the profiling tools are correctly working with appropriate data collection and processing. This paper presents the validation methodology and experimental results of cache and main memory on the recent GPUs and ROCm platform, and the case studies demonstrate that the tests are helpful to scrutinize the profiling tools.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call