Abstract

This paper presents a general energy management system for HPC clusters and cloud infrastructures that powers off cluster nodes when they are not being used, and conversely powers them on when they are needed. This system can be integrated with different HPC cluster middleware, such as Batch-Queuing Systems or Cloud Management Systems, by using a set of connectors, and is also able to deal with different mechanisms for powering on and off the computing nodes (such as Wake-on-Lan, Power Device Units, Intelligent Platform Management Interface or other infrastructure-specific mechanisms). While some existing Batch-Queuing Systems provide energy saving mechanisms, other popular choices lack this feature. Cloud management middleware do not generally provide this feature out of the box, and incorporating it implies making modifications to the middleware. The advantage of our approach is that it can be integrated with different resource management middleware, without needing any modification of that middleware. The paper describes the successful integration of the system proposed with the popular Torque/PBS management system, and also with the OpenNebula open source cloud management tool. Two real use-cases are presented, involving two different HPC clusters. These use cases show significant energy/costs savings of 38% and 16%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call