The Performance Application Programming Interface (PAPI) serves as a coherent, operating-system-independent interface for accessing performance counter data across a wide range of hardware and software components. PAPI can operate autonomously as a performance monitoring library and tool for application analysis. However, its true value emerges when it functions as a middleware for numerous third-party profiling, tracing, and sampling toolkits, establishing itself as a universal interface for hardware counter analysis. In this role, PAPI manages the intricacies of each hardware component, presenting a streamlined API to higher-level toolkits. Within the Exascale Computing Project (ECP), PAPI has expanded its capabilities in performance counter monitoring and incorporated support for power management across cutting-edge hardware and software technologies. This includes performance and power monitoring for AMD GPUs through integration with AMD ROCm and ROCm-SMI, Intel Ponte Vecchio GPUs via Intel’s oneAPI Level Zero, and NVIDIA GPUs through the CUPTI Profiling API. Additionally, PAPI is compatible with interconnects, the latest CPUs, and ARM chips. These enhancements have been implemented while preserving the standard PAPI interface and methodology for utilizing low-level performance counters in CPUs, GPUs, on/off-chip memory, interconnects, and the I/O system, encompassing energy and power management. To strengthen PAPI’s sustainability, ECP has facilitated its integration into Spack and E4S, ensuring software robustness through continuous integration and continuous deployment. In addition to hardware counter-based data, PAPI now supports the registration and monitoring of Software-Defined Events. This feature exposes the internal behavior of runtime systems and libraries like PaRSEC, SLATE, Magma, to applications utilizing those libraries, broadening the scope of performance events to include software-based information. Additionally, PAPI has been expanded with the Counter Analysis Toolkit, aiding in native performance counter disambiguation through micro-benchmarks. These micro-benchmarks probe various essential aspects of modern chips, contributing to the classification of raw performance events. In summary, ECP has enabled PAPI to include comprehensive counter analysis capabilities, advanced performance and power monitoring support for exascale hardware components, and broadened the scope of performance events to encompass not only hardware-related metrics but also software-based information.
Read full abstract