Abstract

Hadoop is the most economical and cheap software framework that allows distributed storage and parallel processing of more extensive data sets. Hadoop distributed file system (HDFS) allows distributed storage and parallel processing of vast data sets using MapReduce. However, Hadoop’s current implementation believes that computing nodes connected in a cluster are homogeneous and distribute the tasks equally. This equal load distribution creates the load imbalance during storage, resource contention during task scheduling, hardware degradation by its excess use, and software misconfiguration during cluster management which are the leading causes of stragglers in heterogeneous Hadoop clusters. Due to hardware heterogeneity, Hadoop’s performance degrades in the heterogeneous environment. In our study, the paper reviews and analyzes significant studies. It presents the new classification taxonomy to broadly classify the existing straggler management and mitigation techniques into two approaches: proactive and reactive. It analyses and compares the state of art studies and identifies their limitation based on their results. Finally, the systematic review discusses the open issues and the potential directions for future work to manage and mitigate stragglers from the heterogeneous Hadoop clusters.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.