Abstract

Now-a-days with the rapid increase in distributed computing systems faults are equally enhancing in scales in spite of many fault detection techniques proposed. Designing and implementing distributed computing systems is challenging due to their ever- increasing scales and the complexity. A faulty distributed system due to any reason during executing its processes can cause some damages. A fault management system helps the distributed systems by detecting malfunctions, errors or faults etc., We investigated different techniques of fault tolerance used in real time distributed system. The main concentration is on types of faults, fault detection techniques and their recovery techniques used. Link failure, resource failure or any other failure is to be detected and rectified for working the system accurately without any disturbances. The fault management applications are hereby enabled to determine the root cause of distributed systems failure automatically. In order to aspect faults detection in distributed systems we propose to combine proactive and reactive techniques in an expert system for managing the faults.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call