Postmortem Analysis of Baget RTOS Processes

V A Galatenko ,K A Kostyukhin

doi:10.17587/prin.12.227-232

Abstract

Despite the best efforts of programmers to create high-quality software, some errors inevitably escape even the most rigorous testing process and are first encountered by end users of the software. When this happens, developers need to quickly understand the reasons for the errors that occurred and eliminate them. Back in 1951, at the dawn of modern computing, Stanley Gill wrote that special attention should be paid to those errors that occur after the program is started, and lead to its termination. Gill is considered the founder of the so-called postmortem debugging, when a program or system is modified to record its state at the time of failure, so that the programmer can later understand what happened and why such a situation occurred. Since then, postmortem debugging technology has been used in many different systems, including all major general-purpose operating systems (OS), as well as specialized OS such as embedded systems and real-time systems. To ensure the high level of reliability expected from such critical systems, it is necessary, on the one hand, to implement the possibility of rapid recovery of the system or its part after a failure. On the other hand, it is necessary to provide a mechanism for storing as much information as possible after each failure, so that the cause of its occurrence can be determined later. To understand the real potential of postmortem debugging tools, we will first consider the current state of debugging methods and the role of postmortem analysis tools, as well as the requirements for postmortem debugging tools for critical systems. Next, we will describe the mechanism of postmortem debugging implemented by the authors in the RTOS Baget and formulate tasks for further development.

Full Text