Abstract

High performance computing is nowadays mostly performed in a best effort fashion. This is surprising as the closely related topic of grid computing, which deals with the federation of resources from multiple domains in order to support large jobs, and cloud computing, which promises seemingly infinite amounts of compute and storage, both offer quality of service (QoS), albeit in different ways. Long-term service level agreements (SLAs), which require the establishment of SLAs long in advance of their actual usage, seem a promising way for the offering of QoS guarantees in an HPC environment in a way that is not disruptive to the business models employed today. This work uses the long-term SLA approach as a basis for the provisioning of service levels for HPC resources and presents an SLA management framework to support this. Flexibility is provided by providing SLAs with different service levels, support for which is integrated into job submission and scheduling. The SLA management framework can, on a high level, be used in a generic fashion and an implementation is presented that is evaluated against a motivating scenario.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call