Abstract

Scientific high throughput computing needs are growing dramatically with time and public Clouds have become an attractive option for occasional bursts, due to their ability to be provisioned with minimal advance notice. The available capacity of both compute and networking is however not well understood. This article presents the results of several production runs of the IceCube collaboration that temporarily expanded its batch system environment with GPU-providing compute instances from the three major Cloud providers, namely Amazon Web Services, Microsoft Azure, and the Google Cloud Platform. The aim of these Cloud bursts was to push the limits of Cloud compute, with a particular emphasis on GPU-providing instances. On the compute side, we showed that it is possible to reach peaks of over 380 fp32 PFLOPS using all available GPU-providing instance types and integrate over 1 fp32 EFLOP hour in a single workday by using only the most cost-effective ones. On the network side, we showed intra-Cloud network throughputs of over 1 Tbps, and 100 Gbps throughputs toward on-prem storage both using shared peering arrangements and dedicated network links.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.