Abstract

Accurate performance analysis is critical for understanding application efficiency and then driving software or hardware optimizations. Although most of static and dynamic performance analysis tools provide useful information, they are not completely satisfactory. Static performance analysis does not provide an accurate view due to the lack of runtime information (eg: cache behavior). On the other hand, profilers, generally mixed with hardware counters, provide a wide range of performance metrics but lack the ability to correlate performance informations with the appropriate code fragment, data structure or instruction. Finally, cycle accurate simulators are too complex and too costly to be used routinely for optimization of real life applications. This paper presents the Differential Analysis method, an approach designed for simple and automatic detection of performance bottlenecks. This approach relies on DECAN, a tool which generates different binary variants obtained by patching individual or groups of instructions. The different variants are then measured and compared, allowing to evaluate the cost of an instruction group and therefore its optimization potential benefit. Differential analysis is illustrated by the use of DECAN on a range of HPC applications to detect performance bottlenecks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call