Abstract
Nowadays, all major processors provide a set of performance counters which capture micro-architectural level information, such as the number of elapsed cycles, cache misses, or instructions executed. Counters can be found in processor cores, processor die, chipsets, or in I/O cards. They can provide a wealth of information as to how the hardware is being used by software. Many processors now support events to measure precisely and with very limited overhead, the traffic between a core and the memory subsystem. It is possible to compute average load latency and bus band-width utilization. This valuable information can be used to improve code quality and placement of threads to maximize hardware utilization.We postulate that performance counters are the key hardware resource to locate and understand issues related to the memory subsystem. In this paper we illustrate our position by showing how certain key memory performance metrics can be gathered easily on today's hardware.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.