Abstract

SummaryMonitoring and testing for regression of large‐scale systems such as the NCSA's Blue Waters supercomputer are challenging tasks. In this paper, we describe the solution we came up with to perform those tasks. Our goal was to find an automated solution for running user‐level regression tests to evaluate system usability and performance. Jenkins, an automation server software, was chosen for its versatility, large user base, and multitude of plugins including collecting data and plotting test results over time. We describe our Jenkins deployment to launch and monitor jobs on remote HPC system, perform authentication with one‐time password, and integrate with our LDAP server for its authorization. We show some use cases and describe our best practices for successfully using Jenkins as a user‐level system‐wide regression testing and monitoring framework for large supercomputer systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call