A new data-grouping-aware dynamic data placement method that take into account jobs execute frequency for Hadoop

Jia-Xuan Wu,Chang-Sheng Zhang,Bin Zhang,Peng Wang

doi:10.1016/j.micpro.2016.07.011

Abstract

Recent years have seen an increasing number of scientists employing data parallel computing frameworks, such as Hadoop, in order to run data-intensive applications. Research on data-grouping-aware data placement for Hadoop has become increasingly popular. However, we observe that many data-grouping-aware data placement schemes are static, without taking MapReduce job execution frequency into consideration. Such data placements scheme will lead to severe performance degradation that is way below the potential efficiency of optimal data distribution when executing MapReduce jobs that are executed frequency. In this paper, we propose a new data-grouping-aware dynamic (DGAD) data placement method based on the job execution frequency. Firstly, we build a job access correlation relation model among the data blocks according to the relationships provided by the records about historical data block access. Then we use a clustering algorithm to divide data blocks into clusters according to the job access correlation relation model among the data blocks and propose a data placement algorithm based on data block clusters in order to put correlated data blocks within a cluster on the different nodes. Finally, a series of experiments are carried out in order to verify the method proposed in this paper. Experimental results show that the proposed method can effectively deal with the mass data and can obviously improve the execution efficiency of MapReduce.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A new data-grouping-aware dynamic data placement method that take into account jobs execute frequency for Hadoop

Abstract

Talk to us

Similar Papers

More From: Microprocessors and Microsystems

Lead the way for us

Journal: Microprocessors and Microsystems	Publication Date: Jul 15, 2016
Citations: 12

Similar Papers

A Multi-Input File Data Symmetry Placement Method Considering Job Execution Frequency for MapReduce Join Operation
Jia-Xuan Wu ... Yu-Zhu Zhang
International Journal of Pattern Recognition and Artificial Intelligence | VOL. 36
Jia-Xuan Wu, et. al.Jia-Xuan Wu ... Yu-Zhu Zhang
15 Dec 2022
International Journal of Pattern Recognition and Artificial Intelligence | VOL. 36

A novel entropy-based dynamic data placement strategy for data intensive applications in Hadoop clusters
K Hemant Kumar Reddy ... Diptendu Sinha Roy
International Journal of Big Data Intelligence | VOL. 6
K Hemant Kumar Reddy, et. al.K Hemant Kumar Reddy ... Diptendu Sinha Roy
01 Jan 2019
International Journal of Big Data Intelligence | VOL. 6

A novel entropy-based dynamic data placement strategy for data intensive applications in Hadoop clusters
Diptendu Sinha Roy ... Vishal Pandey
International Journal of Big Data Intelligence | VOL. 6
Diptendu Sinha Roy, et. al.Diptendu Sinha Roy ... Vishal Pandey
01 Jan 2019
International Journal of Big Data Intelligence | VOL. 6

Dynamic data replication and placement strategy in geographically distributed data centers
Laila Bouhouch ... Claude Tadonki
Concurrency and Computation: Practice and Experience | VOL. 35
Laila Bouhouch, et. al.Laila Bouhouch ... Claude Tadonki
01 Feb 2022
Concurrency and Computation: Practice and Experience | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A new data-grouping-aware dynamic data placement method that take into account jobs execute frequency for Hadoop

Abstract

Talk to us

Similar Papers

More From: Microprocessors and Microsystems