Abstract

Among the essential components of the IBM System z10™ platform is the hardware management console (HMC) and the IBM System z™ support element (SE). Both the SE and the HMC are closed fixed-function computer systems that include an operating system, many middleware open-source packages, and millions of lines of C, C++, and Java™ application code developed by IBM. The code on the SE and HMC is required to remain operational without a restart or reboot over long periods of time. In the first step toward the autonomic computing goal of continuous operation, an integrated, automatic software resource monitoring program has been implemented and integrated in the SE and HMC to look for resource, performance, and operational problems, and, when appropriate, initiate recovery actions. This paper describes the embedded resource monitoring program in detail. Included are the types of resources being monitored, the algorithms and frequency used for the monitoring, the information that is collected when a resource problem is detected, and actions executed as a result. It also covers the types of problems the resource monitoring program has detected so far and improvements that have been made on the basis of empirical evidence.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call