The MIT Supercloud Dataset

Siddharth Samsi,Matthew Hubbell,Charles Yee,Anna Klein,Devesh Tiwari,Antonio Rosa,Anson Cheng,Allan Vanterpool,John Holodnack,Vijay Gadepally,Lauren Milechin,William Arcand,Andrew Prout,Julia Mullen,Albert Reuther,Chansup Byun,Baolin Li,Lindsey Mcevoy,Adam Michaleas,Peter Michaleas,Jeremy Kepner,David Bestor,Matthew L Weiss ,Benjamin W Price ,Daniel Edelman ,Michael Jones ,Joseph Mcdonald

doi:10.1109/hpec49654.2021.9622850

Abstract

Artificial intelligence (AI) and Machine learning (ML) workloads are an increasingly larger share of the compute workloads in traditional High-Performance Computing (HPC) centers and commercial cloud systems. This has led to changes in deployment approaches of HPC clusters and the commercial cloud, as well as a new focus on approaches to optimized resource usage, allocations and deployment of new AI frameworks, and capabilities such as Jupyter notebooks to enable rapid prototyping and deployment. With these changes, there is a need to better understand cluster/datacenter operations with the goal of developing improved scheduling policies, identifying inefficiencies in resource utilization, energy/power consumption, failure prediction, and identifying policy violations. In this paper we introduce the MIT Supercloud Dataset which aims to foster innovative AI/ML approaches to the analysis of large scale HPC and datacenter/cloud operations. We provide detailed monitoring logs from the MIT Supercloud system, which include CPU and GPU usage by jobs, memory usage, file system logs, and physical monitoring data. This paper discusses the details of the dataset, collection methodology, data availability, and discusses potential challenge problems being developed using this data. Datasets and future challenge announcements will be available via https://dcc.mit.edu.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The MIT Supercloud Dataset

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

AI for Datacenter Optimization (ADOPT'22)
Stephanie Brink
-
Stephanie BrinkStephanie Brink
01 May 2022
01 May 2022

The MIT Supercloud Workload Classification Challenge
Benny J Tang ... Andrew Bowne
-
Benny J Tang, et. al.Benny J Tang ... Andrew Bowne
01 May 2022
01 May 2022

Expressing and Managing Network Policies for Emerging HPC Systems
Sergio Rivera ... James Griffioen
-
Sergio Rivera, et. al.Sergio Rivera ... James Griffioen
28 Jul 2019
28 Jul 2019

Improving the Memory Efficiency of In-Memory MapReduce Based HPC Systems
Cheng Pei ... Xuanhua Shi
-
Cheng Pei, et. al.Cheng Pei ... Xuanhua Shi
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The MIT Supercloud Dataset

Abstract

Talk to us

Similar Papers