Abstract

Some parallel programming systems are libraries that allow programmers to write thread-based parallel programs with existing sequential languages. Basically, parallel programs are hard to debug and much more complex than sequential programs which causes design faults to possibly reside in the parallel programs. This paper is aimed to design and implement a software fault-tolerant mechanism in an object-oriented approach for the existing parallel programming systems. With these software fault-tolerant objects, programmers can write their reliable parallel programs on these parallel programming systems. Recover Block, N-Version Programming, and Conversation software fault tolerant mechanisms are chosen to support. All these mechanisms are implemented and grouped into a separate software layer which resides on the top of the parallel programming system, used to monitor the behavior of applications, detect software faults, and recover and restart programs. Parallel programming systems are responsible for managing concurrent threads and for providing fault-tolerant mechanisms with necessary concurrent facilities. This layered system architecture makes these software fault-tolerant mechanisms portable, extensible, and lighter overhead. We have originally implemented the above software fault-tolerant objects based on Presto in C++. These objects have also been ported to C-Thread of Mach and LWP of SUN OS.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.