Abstract

AbstractTo simplify the difficult task of writing fault‐tolerant parallel software, we implemented extensions to the basic functionality of the LINDA or tuple‐space programming model. Our approach implements a mechanism of transaction processing to ensure that tuples are properly handled in the event of a node or communications failure. If a process retrieving a tuple fails to complete processing or a tuple posting or retrieval message is lost, the system is automatically rolled back to a previous stable state. Processing failures and lost messages are detected by time‐out alarms. Roll‐back is accomplished by reposting pertinent tuples. Intermediate tuples produced during partial processing are not committed or made available until a process completes. In the absence of faults, system overhead is low. The fault‐tolerance mechanism is implemented at the system level and requires little programmer effort or expertise. Two implementations of the model are discussed, one using a UNIX network of workstations and one using a Transputer network. Data measuring model overhead and some aspects of system performance in the presence of faults is presented for an example system.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.