Abstract

In software Distributed Shared Memory (SDSM) systems, the large coherence granularity imposed by virtual memory page size tends to induce false sharing, which may lead to heavy network traffic or useless page misses on barrier operations. In this paper, we propose a method to alleviate the coherence overhead of barrier synchronization in the SDSM systems. It performs static analysis on a shared-memory program to examine data dependency between processors across global barriers, and then special primitives are inserted into the program in order to exploit the dependency information at run time. If the data modified before a barrier will be accessed by some of the other processors after the barrier, coherence messages are transferred only to the processors through the inserted primitives. Furthermore, if the modified data will not be used by any other processors, the primitives enforce the coherence messages to be delivered only to master process after the parallel execution of the program completes. We implemented the static analysis with SUIF parallelizing compiler and then evaluated the execution performance of modified programs in a 16-node SDSM system supporting AURC protocol. The experimental results show that our method is very effective at reducing the useless coherence messages, and also can improve the execution time substantially by reducing false sharing misses.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call