Abstract

An autonomic distributed system is composed of geographically distributed autonomic components. One open challenge in autonomic computing is the efficient monitoring at runtime oriented towards the collection of information, from which the system itself will detect, diagnose, and repair problems that result from failures in software and/or hardware components. For this purpose, communication-induced checkpointing CIC can be a useful tool. CIC aims to form global consistent snapshots from which the system can recover. To achieve this, CIC solutions monitor exchanged information among the processes to identify dangerous checkpointing patterns. When a dangerous pattern is identified, it is broken by locally triggering a forced checkpoint. Nevertheless, not all triggered forced checkpoints are necessary. In this paper, we present a delayed CIC approach that reduces forced checkpoints by using triggering rules called safe checkpoint conditions. Finally, we present simulation results that show that our proposal is more efficient than other current solutions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call