Abstract

With cloud computing technology becoming more mature, it is urgent to combine big data processing tool Hadoop with IaaS cloud platform. In this paper, we firstly propose a new Dynamic Hadoop Cluster on IaaS (DHCI) architecture, which includes four key modules: monitoring module, scheduling module, virtual machine management module and virtual machine migration module. The load of both physical hosts and virtual machines are collected by the monitoring module, and can be used for designing resource scheduling and data locality solutions. Secondly, we present a load feedback based resource scheduling scheme. The resource allocation can be avoided on overburdened physical hosts or the strong scalability of virtualized cluster can be achieved by fluctuating the amount of virtual machines (VMs). Thirdly, we reuse the method of VM migration and propose a dynamic migration based data locality scheme. We migrate computation nodes to different host(s) or rack(s) where the corresponding storage nodes are deployed to satisfy the requirement of data locality. We evaluate our solutions in a realistic scenario based on Openstack. Massive experimental results demonstrate the effectiveness of our solutions that contribute to balance workload and performance improvement, even under heavy-loaded cloud system conditions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.