Abstract

With the increasing usage of big data, the types of big data technologies have also become diverse. When solving a particular problem, it often involves many different types of big data tasks. How to realize the hybrid scheduling of different types of tasks is an urgent problem to be solved. Before this, the industry used crontab to schedule big data tasks regularly, it can conveniently execute system task scheduling and user task scheduling in Linux environment, but it cannot meet the scheduling needs of complex business scenarios and it requires users to write their own submission logic. Therefore, this paper designs a hybrid scheduling system for big data tasks based on Airflow. The system supports the construction of different types of big data tasks into a workflow, and scheduling these tasks based on workflow. At the same time, the scheduling module is independent of other modules, which reduces the coupling degree between the modules. The method proposed in this paper has been applied to big data platform, and the effectiveness of the method has been verified.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call