Abstract

Cloud data centers (CDCs) have become increasingly popular and widespread in recent years with the growing popularity of cloud computing and high-performance computing. Due to the multi-step computation of data streams and heterogeneous task dependencies, task failure frequently occurs, resulting in poor user experience and additional energy consumption. To reduce task execution failure as well as energy consumption, we propose a novel AI-driven energy-aware proactive fault-tolerant scheduling scheme for CDCs in this paper. First, a prediction model based on the machine learning approach is trained to classify the arriving tasks into “failure-prone tasks” and “non-failure-prone tasks” according to the predicted failure rate. Then, two efficient scheduling mechanisms are proposed to allocate two types of tasks to the most appropriate hosts in a CDC. The vector reconstruction method is developed to construct super tasks from failure-prone tasks and separately schedule these super tasks and non-failure-prone tasks to the most suitable physical host. All the tasks are scheduled in an earliest-deadline-first manner. Our evaluation results show that the proposed scheme can intelligently predict task failure and achieves better fault tolerance and reduces total energy consumption better than the existing schemes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call