Multi-agent architecture for fault recovery in self-healing systems

Pushpendra Kumar Rajput,Geeta Sikka

doi:10.1007/s12652-020-02443-8

Abstract

Self-healing, a prominent property of self-adaptiveness provides reliability, availability, maintainability, and survivability to a software system. These qualitative factors are very salient to modern distributed systems in which components and their collaboration often vary. Survivability of such systems can be best addressed from an architectural viewpoint. When it comes to maintainability and reliability, architectural level adaptation is not often supported during the design phase. Adaptation to fault tolerance into the design phase of the system development process can increase the scope of software availability and thereby attaining self-healing. In distributed systems, most of the existing architectures are often associated with communication and correspondence as primary criteria. On the other hand, a multi-agent mechanism helps in schematic control of functionality, communication by emphasizing scalability. In this paper, a novel architecture was proposed that could support agent-based distributed systems to address fault recovery aspects for achieving self-adaptiveness. Unlike traditional multi-agent architecture, task-oriented functional multi-agent communication is incorporated for various activities during design phase designated to perform self-healing criteria. An adaptation of agent communication control flow is proposed using three novel mechanism such as planning, functioning and enacting as agents’ critical responsibility. The paper also validates the proposed architecture for resource and availability based faults related to crash and resource unavailability using performance-based evaluation metrics. A case-based application with single thread connectivity is used to reflect the architecture during application design phase and is tested for success using mean response time as evaluation metric.

Full Text