Abstract

The growing importance, large scale, and high server density of high-performance computing datacenters make them prone to attacks, misconfigurations, and failures (of the cooling as well as of the computing infrastructure). Such unexpected events often lead to thermal anomalies – hotspots, fugues, and coldspots – which impact the cost of operation of datacenters. A model-based thermal anomaly detection mechanism, which compares expected (obtained using heat-generation and -extraction models) and observed thermal maps (obtained using thermal cameras) of datacenters, is proposed. In addition, a novel Thermal Anomaly-aware Resource Allocation (TARA) is designed to induce a time-varying thermal fingerprint (thermal map) of the datacenter so to maximize the detection accuracy of the anomalies. As shown via experiments on a small-scale testbed as well as via trace-driven simulations, such model-based thermal anomaly detection solution in conjunction with TARA significantly improves the detection probability compared to anomaly detection when scheduling algorithms such as random, round robin, and best-fit-decreasing are employed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.