Abstract

Hadoop serves as a robust framework tailored for the storage and processing of vast data volumes across clusters. Its foundation lies in the Hadoop Distributed File System (HDFS) for data storage, complemented by the MapReduce paradigm for data processing. MapReduce, functioning as a parallel computing framework, operates adeptly on distributed clusters to manage large-scale data processing tasks. Within this framework, scheduling emerges as a pivotal aspect, influencing the overall system's performance and efficiency. The essence of scheduling in MapReduce lies in enhancing performance, minimizing response times, and optimizing resource allocation. This paper undertakes a systematic exploration of existing scheduling algorithms, offering a fresh classification and detailed examination of each category. Furthermore, the analysis delves into the core principles, objectives, as well as the strengths and weaknesses of these scheduling algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call