Benefiting from the flexibility, scalability and many other advantages of cloud computing, an increasing number of research institutions and enterprises are using cloud platforms to deploy their scientific or commercial workflow applications. From the perspective of cloud service providers, how to schedule workflows reasonably to improve user satisfaction and resource utilization is a big challenge. This challenge becomes more prominent in a dynamic cloud environment, where the workload of user-submitted workflows is constantly changing and unpredictable. Traditional scheduling methods designed for fixed workflows or real-time independent tasks are not applicable in such scenarios. In this paper, we present an online scheduling framework for multiple real-time workflows in a dynamic environment. In this framework, the scheduler employs a deep reinforcement learning based algorithm (R-DQN) to assign each task in workflows to a suitable virtual machine instance according to the current situation. Its goal is to achieve optimal task scheduling and resource allocation, so as to minimize workflow makespan and maximize resource utilization. We provide a comprehensive overview of the design and implementation of our approach, and we conduct experiments using various workflow benchmarks with different structures and scales on the WorkflowSim platform. The experimental results demonstrate that our algorithm significantly outperforms the compared algorithms in term of makespan and other metrics. Moreover, our algorithm demonstrates strong performance in generality, robustness and scalability.
Read full abstract