Abstract

The scheduling approach in MapReduce may result in the long tail problem because of the unreasonable task assignment and high scheduling overhead because of an amount of task scheduling operations. To address these problems, a new task scheduling approach for MapReduce, named Iterative Task Scheduling Algorithm, is proposed. The new approach tries to schedule the map tasks according to the solution of the equation for the optimal task assignment. Thus the long tail problem can be mitigated effectively and the task scheduling operations can be significantly reduced. To support our new scheduling approach, two approaches are proposed: The first one is adopted to estimate task execution times of nodes and the second one is adopted to produce the optimal task assignment based on the known task execution times of nodes. Comprehensive experiments have been performed with the real log data from the Ali Cloud and the results verify the effectiveness of the new task scheduling approach. The map runtime of the job is reduced 23% in our experiments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call