Map Reduce for big data processing based on traffic aware partition and aggregation

G Venkatesh,K Arunesh

doi:10.1007/s10586-018-1799-6

Abstract

Big data refers to data sets whose volume is 500+ terabytes of data per day. The velocity makes it difficult to capture, manage, process and analyze 2 million records per day. Another characteristics of big data is variability which makes it difficult to identify the reason for losses in i.e., images, audio, video, sensor data and log files etc., Hadoop can be used to analyze this huge amount of data using Hadoop an approximate early result for executing the job partially becomes available for the user even before completion of job which reduce the response time. In Layers 3 Traffic aware clustering programming model is used for processing big data which includes the data processing function map by sort and reducing techniques. The implementation of the layers three traffic aware clustering method will be on the top of Hadoop which is partitioned into HDFS fixed sized blocks and generates intermediate output as a collection of pairs. The conventional hash function method is used for partitioning intermediate data among reduced task but it is not traffic efficient. In this paper to reduce network traffic cost, a Map Reduce task is done by designing data partition and aggregator that can reduce task merged traffic from multiple map tasks. The proposed algorithm is more efficient to reduce response time and the simulation results have showed proposal can reduce network traffic.

Full Text