Abstract

In this paper, we propose a novel algorithm to solve the starving problem of the small jobs and reduce the process time of the small jobs on Hadoop platform. Current schedulers of MapReduce/Hadoop are quite successful in achieving data locality and scheduling the reduce tasks with a greedy algorithm. Some jobs may have hundreds of map tasks and just several reduce tasks, in which case, the reduce tasks of the large jobs require more time for waiting, which will result in the starving problem of the small jobs. Since the map tasks and the reduce tasks are scheduled separately, we can change the way the scheduler launches the reduce tasks without affecting the map phase. Therefore we develop an optimized algorithm to schedule the reduce tasks with the shortest remaining time (SRT) of the map tasks. We apply our algorithm to the fair scheduler and the capacity scheduler, which are both widely used in real production environment. The evaluation results show that the SRT algorithm can decrease the process time of the small jobs effectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.