Chip to Chiller Experimental Cooling Failure Analysis of Data Centers: The Interaction Between IT and Facility

Husam A Alissa,Mark J Seymour,Dustin W Demetriou,David Mendo,Ken Schneebeli,Bahgat G Sammakia,Russell Tipton,Kourosh Nemati

doi:10.1109/tcpmt.2016.2599025

Abstract

Cooling failure in data centers (DCs) is a complex phenomenon due to the many interactions between the cooling infrastructure and the information technology equipment (IT). To fully understand it, a system integration philosophy is vital to the testing and design of experiment. In this paper, a facility-level DC cooling failure experiment is run and analyzed. An airside cooling failure is introduced to the facility during two different cooling set points as well as in open and contained environments. Quantitative instrumentation includes pressure differentials, tile airflow, external contour and discrete air inlet temperature, intelligent platform management interface (IPMI), and cooling system data during failure recovery. Qualitative measurements include infrared imaging and airflow visualization via smoke trace. To our knowledge of current literature, this is the first experimental study in which an actual multi-aisle facility cooling failure is run with real IT (compute, network, and storage) load in the white space. This will establish a link between variations from the facility to the central processing unit (CPU). The results show that using the external IT inlet temperature sensors, the containment configuration shows a longer available uptime (AU) during failure. However, the IPMI data show the opposite. In fact, the available uptime is reduced significantly when the external sensors are compared to internal IT analytics. The response of the IT power, CPU temperature, and fan speed shows higher values during the containment failure. This occurs because of the instantaneous formation of external impedances in the containment during failure, which renders the contained aisle to be less resilient than the open aisle. The tradeoffs between PUE, OPEX, and AU are also explained.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Components and Packaging Technologies	Publication Date: Sep 1, 2016
Citations: 37	License type: publisher-specific, author manuscript

R Discovery Prime

R Discovery Prime

Chip to Chiller Experimental Cooling Failure Analysis of Data Centers: The Interaction Between IT and Facility

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Components and Packaging Technologies

Lead the way for us

Similar Papers

An Experimental Analysis of Hot Aisle Containment Systems
Sadegh Khalili ... Anuroop Desu
-
Sadegh Khalili, et. al.Sadegh Khalili ... Anuroop Desu
01 May 2018
01 May 2018

Impact of Server Thermal Design on the Cooling Efficiency: Chassis Design
Sadegh Khalili ... David Moss
Journal of Electronic Packaging | VOL. 141
Sadegh Khalili, et. al.Sadegh Khalili ... David Moss
10 Apr 2019
Journal of Electronic Packaging | VOL. 141

Empirical analysis of blower cooling failure in containment: Effects on IT performance
Husam A Alissa ... Bahgat G Sammakia
-
Husam A Alissa, et. al.Husam A Alissa ... Bahgat G Sammakia
01 May 2016
01 May 2016

Cooling control based on model predictive control using temperature information of IT equipment for modular data center utilizing fresh-air
Masatoshi Ogawa ... Hiroyuki Fukuda
-
Masatoshi Ogawa, et. al.Masatoshi Ogawa ... Hiroyuki Fukuda
01 Oct 2013
01 Oct 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Chip to Chiller Experimental Cooling Failure Analysis of Data Centers: The Interaction Between IT and Facility

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Components and Packaging Technologies