Abstract

MapReduce is a distributed programming model for expressing distributed computation on the massive amounts of data. It also is an execution framework for large-scale data processing on clusters. However, Hadoop is a serious limitation due to its MapReduce scheduler exhiting poor performance in a heterogeneous Cloud environment. LATE scheduler for MapReduce takes heterogeneous systems into consideration. Unfortunately, it still falls the poor performance due to its static manner during the progress of tasks computation. To further improve the total performance on the computation efficiency in a heterogeneous cloud, a Fine-Grained and dynamic MapReduce scheduling algorithm (FiGMR) is proposed. Based on the historical and real-time information obtained from each node in a cloud, FiGMR can select the appropriate parameters to dynamically detect the slow tasks. Map or reduce slow nodes means nodes which execute map or reduce tasks for a longer timespan than other nodes. Furthermore, the map nodes are classified into high-performance nodes and low-performance nodes. The corresponding slow tasks are also classified into slow map tasks and slow reduce tasks. Adopting the reasonable scheduling scheme, FiGMR can launch backup map tasks on the high-performance map nodes. The experimental results indicate that the proposed FiGMR can significantly reduce the tasks execution time and improve the resources' utilization., compared with the Hadoop default scheduler and LATE scheduler.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call