Abstract

We describe our memory performance analysis of SPEC2000C using the newly released Intel(R) Itanium/sup TM/ processor (IPF). Memory overhead is very significant for SPEC200OC; on the average 39% cycles are spent in data stalls. Cache misses are significant, but also data translation performance (DTLB) affects many benchmarks. We present a study based on collecting measurements from the hardware performance counters and cache profiling using program instrumentation of loads/stores. We define important loads as the load sites that contribute at least 95% of the cache misses at all levels. Our measurements show that the number of important loads in a program is relatively small. Our analysis show that important loads are most of the time contained in inner loops, and that the trip counts of these loops is significantly high. We present preliminary results on using stride profiling to reduce cache misses of important loads, bringing an average of 6% improvement to SPEC2000C. Finally, we present our study of data translation performance and propose design choices.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.