Abstract

Job scheduling of MapReduce is a research hot spot, especially on the heterogeneous datacenter. Huge energy consumption and operating costs are key challenges. Most of the previous work only considers the scheduling optimization of a single job. In this paper, we take multiple jobs of MapReduce as research objects and focus on the goal of “jointly optimizing the scheduling time, job costs and energy consumption.” For that, an energy- and locality-efficient MapReduce multi-job scheduling algorithm is developed for the heterogeneous datacenter. Firstly, we use rack as the basic unit of resource in job scheduling to reduce data communication between jobs and to facilitate energy savings. Secondly, according to the capacity of heterogeneous rack, we design a multi-job pre-mapping method to optimize the execution order of jobs and jointly optimize the scheduling time, job costs and energy consumption. Based this pre-mapping method, we can assign one job to the virtual machine on the same rack, so as to minimize the amount of online rack. This centralized mapping strategy is very helpful to save energy and reduce data transmission of jobs. Thirdly, the map and reduce tasks of a job will be divided into multiple task groups for parallel execution, thereby further reducing data communication and energy consumption. Finally, a lot of experimental results prove the advantages of our algorithm.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.