Abstract

Organisations such as research institutions and universities often increase utilisation of their office workstations by deploying a high-throughput cycle-stealing distributed system. Such systems allow users to submit a large number of computing tasks into a central pool. The system observes activity of workstations and continually assigns tasks to idle machines. When a user becomes active on the machine, the scheduler interrupts the task execution. This approach can significantly increase utilisation of the resources. However, it can also lead to wastage of computing cycles if tasks get interrupted too often.In this paper, we develop a detailed Population Continuous Time Markov Chain (PCTMC) model of the whole system that accurately captures the contention between the interactive users and high-throughput tasks. The PCTMC framework is well suited to the inherently time-inhomogeneous nature of the user behaviour and allows to capture a large number of performance and energy consumption metrics. We fit the PCTMC model to real data and propose a methodology to forecast cluster availability in the near future. We show how to use historically collected and live data to parametrise the PCTMC model and use efficient fluid analysis techniques to predict the desired metrics. Additionally, the fast analysis enables exploration of various what-if scenarios. We demonstrate a working implementation of the method using the existing GPA tool for analysis of PCTMC models. We argue that this methodology could allow the system maintainers to optimise the energy and performance parameters of the system. Moreover, it would benefit the users who could use the model forecasts to better distribute and plan their large scale computations.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call