Abstract

This chapter presents a summary of various fault-detection and fault-tolerance methods at various levels of a computing system: arithmetic circuit, field-programmable gate arrays (FPGA), program control flow, processor system, and algorithm. It also presents some of the works in these areas. Results show that the techniques discussed produce significant increases in system reliability. As computer chips become more complex and denser, with smaller devices, they and systems incorporating them become more and more vulnerable to faults arising from a myriad of sources like fabrication errors, operation extremes, and external disturbances. Computer systems are also being increasingly used in various economic- and life-critical environments. It is thus hoped that microprocessor and computer manufacturing undergoes a paradigm shift whereby reliability becomes an important metric, and fault tolerance capabilities of different degrees are explicitly designed in computer chips and systems to provide different levels of dependability and reliability needed in various application environments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call