Abstract
Fault-tolerance is very important in cluster computing and has been implemented in many famous cluster-computing systems using checkpoint/restart mechanisms. But existent check-pointing algorithms cannot restore the states of a file system when roll-backing the running of a program, so there are many restrictions on file accesses in existent fault-tolerance systems. SCR algorithm, an algorithm based on atomic operation and consistent schedule, which can restore the states of file systems, is presented in this paper. In the SCR algorithm, system calls on file systems are classified into idem-potent operations and non-idem-potent operations. A non-idem-potent operation modifies a file system’s states, while an idem-potent operation does not. SCR algorithm tracks changes of the file system states. It logs each non-idem-potent operation used by user programs and the information that can restore the operation in disks. When check-pointing roll-backing the program, SCR algorithm will revert the file system states to the last checkpoint time. By using SCR algorithm, users, are allowed to use any file operation in their programs.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.