Abstract

High availability (HA) computing has long gained much attention in enterprise and mission critical systems. HA goals are to maximize the uptime, thus undoubtedly complementing high-performance computing (HPC) objectives. HA-OSCAR is a project that aims to improve HA in commercial-off-the-shelf (COTS)-based HPC environments. In this paper, we introduce a multi-head HPC cluster architecture. Server redundancy is an initial key aspect aiming toward downtime reduction. Two HA-OSCAR types, active–standby and active–active, are studied. We evaluate system dependability for given two models. Stochastic Reward Nets (SRN) are used to model the system availability. We describe our SRN modeling using Stochastic Petri Net Package, and compute several interesting results that characterize HA-OSCAR availability.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call