Abstract

Due to the mismatch in the speed of the processor and the speed of the memory subsystem, modern processors spend a significant portion (often more than 50%) of their execution time stalling on cache misses. Processor multithreading is an approach that can reduce this stall time; however processor multithreading increases the cache miss rate and demands higher memory bandwidth. In this paper, a novel compiler optimization method has been presented that improves data locality for each thread and enhances data sharing among the threads. The method is based on loop transformation theory and optimizes both spatial and temporal data locality. The created threads exhibit a high level of intra-thread and inter-thread data locality which effectively reduces both the data cache miss rates and the total execution time of numerically intensive computation running on a multithreaded processor.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call