Abstract
In production multi-user high-performance (HPC) batch computing environments, wait times for scheduled jobs are highly dynamic. For scientific users, the primary measure of efficiency is wall clock time-to-solution. In high throughput applications, such as many kinds of biological analysis, the computational work to be done can be flexibly scheduled taking a longer time on a small number of processors or a shorter time on a large number of processors. Therefore the capability to choose a platform at run-time based on both processing capabilities and availability (lowest wait time) would be attractive. The goal of our work was to create an adaptive interface to HPC systems that dynamically reschedules high-throughput calculations in response to fluctuating load, optimizing for time-to-solution. This was done by implementing middleware functionality to (1) monitor the resource load on a given compute cluster, (2) generate a plan, checking on the applicability of the plan with the defined goals and (3) adaptively choosing the optimal job dimensions (number of processors and wall-clock time) to provide the best time-to-solution results.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.