Abstract

Due to the mismatch in the speed of the processor and the speed of the memory subsystem, modern processors spend a significant portion (often more than 50%) of their execution time stalling on cache misses. Processor multithreading is an approach that can reduce this stall time; however processor multithreading increases the cache miss rate and demands higher memory bandwidth. In this paper, a novel compiler optimization method has been presented that improves data locality for each thread and enhances data sharing among the threads. The method is based on loop transformation theory and optimizes both spatial and temporal data locality. The created threads exhibit a high level of intra-thread and inter-thread data locality which effectively reduces both the data cache miss rates and the total execution time of numerically intensive computation running on a multithreaded processor.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.