Optimized Thread Creation for Processor Multithreading

B Sinharoy

doi:10.1093/comjnl/40.6.388

Abstract

Due to the mismatch in the speed of the processor and the speed of the memory subsystem, modern processors spend a significant portion (often more than 50%) of their execution time stalling on cache misses. Processor multithreading is an approach that can reduce this stall time; however processor multithreading increases the cache miss rate and demands higher memory bandwidth. In this paper, a novel compiler optimization method has been presented that improves data locality for each thread and enhances data sharing among the threads. The method is based on loop transformation theory and optimizes both spatial and temporal data locality. The created threads exhibit a high level of intra-thread and inter-thread data locality which effectively reduces both the data cache miss rates and the total execution time of numerically intensive computation running on a multithreaded processor.

Full Text