Mapreduce scheduler: A bird eye view

Ashwani Sheoran,D Malathi,K Senthil Kumar

doi:10.1109/iceca.2017.8203673

Abstract

MapReduce Scheduling problem has been an active area of research in Computer Science field. MapReduce is a programming model used by Google to process large amount of data in a distributed computing environment. The Apache Hadoop software library is a framework that allows the distributed processing of large data set across clusters of computers using programming model. The programming model automatic handle of node failures hiding the complexity of fault tolerance. It is most widely adopted framework for distributed data processing because of open source and allowing commodity hardware. MapReduce scheduling has become an important factor to achieve high performance in Hadoop cluster. There are many MapReduce scheduling algorithms have been developed for Hadoop. This paper provides an overview of six different scheduling algorithms for MapReduce namely; Scheduling algorithm in Hadoop, First In First Out(FIFO) MapReduce Scheduling algorithm, Fair MapReduce scheduling algorithm, Capacity MapReduce scheduling algorithm, Delay MapReduce scheduling algorithm, MatchMaking MapReduce scheduling algorithm, longest Approximate Time To End(LATE) MapReduce scheduling algorithm. An overview of these techniques is provided through this paper. Advantages and disadvantages of these algorithms are identified. This paper is helpful for the beginners and researchers for understanding the scheduling in big data processing.

Full Text