Abstract

Different parallelization techniques have been proposed in the literature for irregular reductions in the context of shared memory multiprocessors. They may be classified into two broad families: those based on privatization of the reduction arrays and those based on the partitioning of the reduction arrays. Methods in the first family are simple but no data locality is exploited and their memory scalability is low. On the other hand, methods in the second family are more complex as they require an inspection phase but they exploit data locality and scale up better in memory. Focusing on partitioning-based methods, although they exhibit a good performance in a wide variety of irregular codes, some specific input data patterns may exist for which the performance is lowered. In particular these kind of access patterns may reduce the exploited parallelism by the method or introduce workload unbalances. In order to mitigate these negative effects, we propose three optimizations for a specific partitioning-based method (DWA–LIP). These optimizations try to increase the exploited parallelism, balance the workload and reduce the effect of high contention degree regions in the reduction arrays. Efficient implementations of the proposed optimizations for the DWA–LIP method have been tested experimentally, and compared with other methods for parallelizing irregular reductions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call