Hadoop Job Scheduling Using Improvised Ant Colony Optimization

G Joel Sunny Deol, Et Al

doi:10.17762/turcomat.v12i2.2403

Abstract

Hadoop Distributed File System is used for storage along with a programming framework MapReduce for processing large datasets allowing parallel processing. The process of handling such complex and vast data and maintaining the performance parameters up to certain level is a difficult task. Hence, an improvised mechanism is proposed here that will enhance the job scheduling capabilities of Hadoop and optimize allocation and utilization of resources. Significantly, an aggregator node is added to the default HDFS framework architecture to improve the performance of Hadoop Name node. In this paper, four entities viz., the name node, secondary name node, aggregator nodes, and data nodes have been modified. Here, the aggregator node assigns jobs to data node, while Name node tracks aggregator nodes. Also, based on the job size and expected execution time, an improvised ant colony optimization method is developed for scheduling jobs.In the end, the results demonstrate notable improvisation over native Hadoop and other approaches.

Highlights

Hadoop clusters have acquired great acceptance for their efficiency in computation, thereby helping save time and cost.Hadoopcomprises an HDFS and MapReduce as its two mainstays
The users are endowed with the facility of distributed storage access by the HDFS, while distributed processing is offered by MapReduce.The name node and the data nodes build up the HDFS, which ensure that distributed environments and storage facilities are efficiently managed
The tasks are run on a cluster in MapReduce which allows data to be managed in a distributed data storage system.MapReduce splits input dataset and creates many blocks, measuring 64 or 128 MB, before storing in an HDFS.The two functions used by MapReduce component are as follows:

Summary

Introduction

Hadoop clusters have acquired great acceptance for their efficiency in computation, thereby helping save time and cost.Hadoopcomprises an HDFS and MapReduce as its two mainstays. After virtualization of physical cluster, cloning of single image (for e.g. cloning of data node) can be performed which reduce the cost and enhance the performance.It adds new features to it. When their functioning was analyzed, a majority of the known schedulers, including LATE and FCFS failed in performing better. In order to enhance Hadoop performance,new methodsfor job scheduling, resource utilization and allocation have been proposed in this paper.The Amazon EC2 nodes are applied wherein a particular node is designated as the master node, while other nodes are designated as slave nodes. Every master node in the proposed HDFS cluster is found to be populated by many aggregator nodes, while in the slave nodes, map, reduce, and shuffle functions are assigned as a three-phased process

Scheduling Based on Improved Ant Colony Optimization Algorithm

Results and Experiments

Conclusion