Abstract

Distributed Shared-Memory (DSM )c omputers, which partition physical memory among a collection of workstationlike computing nodes, are now a common way to implement parallel machines. Recently, there has been much interest in DSM machines that use software, instead of hardware, to implement coherence protocols to manage data replication and cache coherence. Software offers many advantages, not th el east of which is the possibility of adding significant functionality — such as race detection — to a protocol. This paper describes a new, transparent, protocol-base dt echnique for automatically detecting data races on-the-fly. An implementation of this approach in a DSM system running on a Thinking Machines CM-5 found data races in two of a set of five shared-memory benchmarks. Monitored applications had slowdowns ranging from 0–3 on 32 nodes. 1I ntroduction Race conditions arise in shared-memory parallel programs when accesses to shared memory are not properly synchronized. There has been much interest in efficient tools for detecting and reporting these race conditions since a lack of synchronization can lead programs to behave unpredictably. One promising approach to implementing these tools exploits recent fine-grained distributed shared-memory systems in which the coherence policies are implemented in software, instead of being rigidly encoded in hardware. Experiments have shown that the performance penalties for implementing coherence actions in software, instead of hardware, are relatively small (especially if there is hardware support for common operations [7, 17]), and that using the flexibility of software to tailor protocols to th en eeds of applications can result in tremendous performance increases [5]. Research platforms, such as FLASH [7] and Tempest [16], have paved the way for systems from Sequent [8] and DEC [20]. This paper describes a new use for the flexibility offered by software coherence policies: Implementing a transparent, This research supported by: NSF NYI Award CCR-9357779, with support from Sun Microsystems, and NSF Grant MIP-9625558. protocol-based technique for detecting data races on-the-fly. Fine graine dD SM systems enable efficient, real-time detection of data races, since they already contain a mechanism to invoke the coherence protocol in response to shared-memory accesses. The protocol need only be extended to monitor each access to shared memory and maintain a history of the accesses. The information in the histories is sufficient to detect data races on-the-fly in programs with barrieronly synchronization. Data races can be found in programs with pairwise synchronization by using the access histories in conjunction with additional sequencing information, such as vector timestamps [24], or by using techniques like lockset refinement [19]. Most previous race-detection techniques required support from a custom parallelizing compiler or other tools. This tied race detection to a particula rl anguage, implementation, and platform .T hefine-grained, protocol-based race detection schem ed escribed in this paper has low overheads and is completely independent of program source code. Race detection can be performed on programs written in any language, and on library routines for which the source may not be available. The approach therefore makes efficient racedetection available to a wider audience than was previously the case. 2B ackground

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call