Abstract
Data is the fastest growing asset in the 21st century, extracting insights is becoming of the essence as the traditional ecosystems are incapable to process the resulting amounts, complying with different structural levels, and is rapidly produced. Along this paradigm, the need for processing mostly real time data among other factors highlights the need for optimized Job Scheduling Algorithms, which is the interest of this paper. It is one of the most important aspects to guarantee an efficient processing ecosystem with minimal execution time, while exploiting the available resources taking into consideration granting all the users a fair share of the dedicated resources. Through this work, we lay some needed background on the Hadoop MapReduce framework. We run a comparative analysis on different algorithms that are classified on different criteria. The light is shed on different classifications: Cluster Environment, Job Allocation Strategy, Optimization Strategy, and Metrics of Quality. We, also, construct use cases to showcase the characteristics of selected Job Scheduling Algorithms, then we present a comparative display featuring the details for the use cases.
Highlights
The new digital world is growing, new day-to-day habits are adopted and every aspect of the world as we previously know has a digital equivalent.Connectivity changing from a luxury to a necessity changed the role the internet is playing, the massive increase in data generating devices and end users, the emergence of the modern terms like the internet of things(IoT) and the new digital life through Social Media are all affecting the amount of data being generated, stored and in need to be processed (Amir & Murtaza, 2015), as a result big data and its frameworks or ecosystems are the keywords used to indicate the need for distributed, parallel computing
In dis- tributed, parallel fashion (Hashem et al, 2016). It had its default job scheduling mechanism that was based on FIFO (Senthilkumar Ilango, 2016), which later was removed from MapReduce as a component and is considered a plug- gable component allowing MapReduce Job Scheduling algorithm and technique to be customized per project
Scheduling has been a persistent issue for a long time in various systems and clusters, the need to run the tasks and allocate the resources to said tasks summarizes the idea behind Job Scheduling, which has been and continuous to be an area of interest in the research field(Mohamed & Hong, 2016)
Summary
The new digital world is growing, new day-to-day habits are adopted and every aspect of the world as we previously know has a digital equivalent. Vol 13, No 7; 2019 perspective, job schedulers are aiming to tackle a few issues resulting from the MapReduce paradigm, resources managers and nego- tiators Untimely, those schedulers are sometimes used along with optimizers to achieve a certain objective or set of objectives given a constraint acting as heuristic (Hashem et al, 2018). Our motive for this work, we highlight the current algorithms for job scheduling along with their drawbacks and strength points, to help find a new algorithm or a hybrid of existing algorithms, acting as a unified ecosystem addressing the issues that are considered to be a major drawback for other algorithms This survey is organized into five chapters.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.