Abstract

In the coming exascale era, the complexity of modern applications and hardware resources imposes significant challenges for boosting the efficiency via execution fine-tuning. To abstract this complexity in an intuitive way, recent application analysis tools rely on insightful modeling, e.g., Intel® Advisor with Cache-aware Roofline Model. However, these approaches mainly consider the maximum architecture capabilities, which may limit their usability when characterizing real-world applications. To address this issue, a novel Cache-Aware Roofline Model for more accurate performance modeling of multi-cores is proposed, which realistically resembles application requirements. The proposed fine-grain modeling relies on micro-benchmarking to decouple the attainable performance of the micro-architecture for different utilization scenarios and for a diverse set of functional units and memory levels. Memory sub-system traffic simulation, dynamic and static analyses are also used to derive the requirements of the applications. Experimental results for a real multi-core system with an Intel server processor and for a set of 13 kernels from exascale proxy applications, show that the proposed models provide more accurate application characterization, optimization hints and bottleneck detection in comparison to the state-of-the-art models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call