Abstract

The IBM System z10™ server reliability, availability, and serviceability (RAS) design continues to reduce the sources of server outages through innovative RAS architecture and techniques. The z10™ server introduced functional improvements that challenged the RAS design. Increases were made in the performance of each processor, the total number of processors, the total size of the memory, the amount of cache, the bandwidth of the I/O, the thermal density, and the exposure to soft errors. These changes demanded stronger RAS functions to prevent unscheduled outages. Significant improvements were made to the IBM e-business on demand® functions (concurrent, customer-requested upgrades) that enable customers to better manage capacity without having to take planned outages. The hypervisor simplified configuration changes, such as adding cryptography or channel subsystems to logical partitions, by eliminating the need for preplanning. Single-core checkstopping and single transparent CPU (central processing unit) sparing were added. The RAS functions reduced the number of scheduled outages. Product improvements were complemented by improvements in RAS modeling. This paper describes these RAS improvements and how they provide value to the customer.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.