A Reinforcement Learning Based Resource Management Approach for Time-critical Workloads in Distributed Computing Environment

Zixia Liu,Hong Zhang,Liqiang Wang,Bingbing Rao

doi:10.1109/bigdata.2018.8622393

Abstract

Many data analyzing applications highly rely on timely response from execution, and are referred as time-critical data analyzing applications. Due to frequent appearing of gigantic amount of data and analytical computations, running them on large scale distributed computing environments is often advantageous. The workload of big data applications is often hybrid, i.e., contains a combination of time-critical and regular non-time-critical applications. Resource management for hybrid workloads in complex distributed computing environment is becoming more critical and needs more studies. However, it is difficult to design rule-based approaches best suited for such complex scenarios because many complicated characteristics need to be taken into account.Therefore, we present an innovative reinforcement learning (RL) based resource management approach for hybrid workloads in distributed computing environment. We utilize neural networks to capture desired resource management model, use reinforcement learning with designed value definition to gradually improve the model and use e-greedy methodology to extend exploration along the reinforcement process. The extensive experiments show that our obtained resource management solution through reinforcement learning is able to greatly surpass the baseline rule-based models. Specifically, the model is good at reducing both the missing deadline occurrences for time-critical applications and lowering average job delay for all jobs in the hybrid workloads. Our reinforcement learning based approach has been demonstrated to be able to provide an efficient resource manager for desired scenarios.

Full Text