Abstract

Multicores are now the norm. Their cache hierarchy has often a last level shared cache. The performance of this shared cache during the execution of multithreaded applications depends on the parallelization scheme followed. For example, critical parameters for the performance of parallelized loops are the number of threads and the block size. The selection of the optimal scheme in a compiler can be guided using heuristics or the execution time. Heuristics can be imprecise, while an execution time guided search is very time-consuming. This paper shows the usage of an analytical model to predict the cache behavior of shared caches during the execution of multithreaded applications that have been parallelized at loop level. The model predicts the number of misses generated by a given code when different number of threads or block sizes are used. The execution time of the codes analyzed is highly correlated to the number of misses generated in the shared cache, thus, the prediction of the model is a powerful tool to select the best parallelization scheme for them.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call