A Framework for Experimental Validation and Performance Evaluation in Fault Tolerant Distributed System

Hein Meling

doi:10.1109/ipdps.2007.370600

Hein Meling

Open Access

PDF Available

https://doi.org/10.1109/ipdps.2007.370600

Copy DOI

Export

Save

Cite

Publication Date: Jan 1, 2007

Citations: 1

Affiliation: University of Stavanger

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Performing experimental evaluation of fault tolerant distributed systems is a complex and tedious task, and automating as much as possible of the execution and evaluation of experiments is often necessary to test a broad spectrum of possible executions of the system to obtain good coverage. The confidence of the results obtained from an experimental evaluation depends on the degree of control over the environment in which experiments are being executed. Typically, an uncontrolled environment is exposed to numerous sources of external influence that can affect the obtained results. Automated and repeated executions can be used to reduce the impact of such influences. In this paper, a framework for experimental validation and performance evaluation of fault management in a fault tolerant distributed system is presented. The framework provides a facility to execute experiments in a configured target system. It is based on injecting faults or other events needed to test the fault handling capability of the system. Relevant events are logged and collected for postprocessing and analysis, e.g. to construct a single global timeline of events occurring at different nodes in the target system. This timeline of events can then be used to validate the behavior a system, and to evaluate its performance.

Full Text