Job Execution Performance Research Articles

We provide a queueing-theoretic framework for job replication schemes based on the principle “ <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">replicate a job as soon as the system detects it as a straggler</i> ”. This is called job <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">speculation</i> . Recent works have analyzed replication on arrival, which we refer to as <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">replication</i> . Replication is motivated by its implementation in Google’s BigTable. However, systems such as Apache Spark and Hadoop MapReduce implement speculative job execution. The performance and optimization of speculative job execution is not well understood. To this end, we propose a queueing network model for load balancing where each server can speculate on the execution time of a job. Specifically, each job is initially assigned to a single server by a frontend dispatcher. Then, when its execution begins, the server sets a timeout. If the job completes before the timeout, it leaves the network, otherwise the job is terminated and relaunched or resumed at another server where it will complete. We provide a necessary and sufficient condition for the stability of speculative queueing networks with heterogeneous servers, general job sizes and scheduling disciplines. We find that speculation can increase the stability region of the network when compared with standard load balancing models and replication schemes. We provide general conditions under which timeouts increase the size of the stability region and derive a formula for the optimal speculation time, i.e., the timeout that minimizes the load induced through speculation. We compare speculation with redundant- <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$d$ </tex-math></inline-formula> and redundant-to-idle-queue- <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$d$ </tex-math></inline-formula> rules under an <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$S\& X$ </tex-math></inline-formula> model. For light loaded systems, redundancy schemes provide better response times. However, for moderate to heavy loadings, redundancy schemes can lose capacity and have markedly worse response times when compared with the proposed speculative scheme.

As a widely-used parallel computing framework for big data processing today, the Hadoop MapReduce framework puts more emphasis on high-throughput of data than on low-latency of job execution. However, today more and more big data applications developed with MapReduce require quick response time. As a result, improving the performance of MapReduce jobs, especially for short jobs, is of great significance in practice and has attracted more and more attentions from both academia and industry. A lot of efforts have been made to improve the performance of Hadoop from job scheduling or job parameter optimization level. In this paper, we explore an approach to improve the performance of the Hadoop MapReduce framework by optimizing the job and task execution mechanism. First of all, by analyzing the job and task execution mechanism in MapReduce framework we reveal two critical limitations to job execution performance. Then we propose two major optimizations to the MapReduce job and task execution mechanisms: first, we optimize the setup and cleanup tasks of a MapReduce job to reduce the time cost during the initialization and termination stages of the job; second, instead of adopting the loose heartbeat-based communication mechanism to transmit all messages between the JobTracker and TaskTrackers, we introduce an instant messaging communication mechanism for accelerating performance-sensitive task scheduling and execution. Finally, we implement SHadoop, an optimized and fully compatible version of Hadoop that aims at shortening the execution time cost of MapReduce jobs, especially for short jobs. Experimental results show that compared to the standard Hadoop, SHadoop can achieve stable performance improvement by around 25% on average for comprehensive benchmarks without losing scalability and speedup. Our optimization work has passed a production-level test in Intel and has been integrated into the Intel Distributed Hadoop (IDH). To the best of our knowledge, this work is the first effort that explores on optimizing the execution mechanism inside map/reduce tasks of a job. The advantage is that it can complement job scheduling optimizations to further improve the job execution performance.

Job Execution Performance Research Articles

Related Topics

Articles published on Job Execution Performance

Improving MapReduce heterogeneous performance using KNN fair share scheduling

Stability and Optimization of Speculative Queueing Networks

Quality and energy optimized scheduling technique for executing scientific workload in cloud computing environment

Heterogeneous Job Allocation Scheduler for Hadoop MapReduce Using Dynamic Grouping Integrated Neighboring Search

Keddah

An efficient job management of computing service using integrated idle VM resources for high-performance computing based on OpenStack

Assessing the Impact of Training on Staff Performance: Evidence from Ghana Health Service in the Kumasi Metropolis

MR-Advisor: A comprehensive tuning, profiling, and prediction tool for MapReduce execution frameworks on HPC clusters

Hybrid Job-Driven Scheduling for Virtual MapReduce Clusters

Performance Optimization for Short Job Execution in Hadoop MapReduce

Job completion time on a virtualized server with software rejuvenation

SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters

Cloud Scheduling Optimization: A Reactive Model to Enable Dynamic Deployment of Virtual Machines Instantiations

Research on the Solution to Grid Data Replication Management in Distributed Environment

Prediction of resource requirement using feedback on job execution performance

Enhancing Availability of Grid Computational Services to Ubiquitous Computing Applications

Selfish Grids: Game-Theoretic Modeling and NAS/PSA Benchmark Evaluation

A resource management and fault tolerance services in grid computing

Dynamic cluster resource allocations for jobs with known and unknown memory demands

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Job Execution Performance Research Articles

Related Topics

Articles published on Job Execution Performance

Improving MapReduce heterogeneous performance using KNN fair share scheduling

Stability and Optimization of Speculative Queueing Networks

Quality and energy optimized scheduling technique for executing scientific workload in cloud computing environment

Heterogeneous Job Allocation Scheduler for Hadoop MapReduce Using Dynamic Grouping Integrated Neighboring Search

Keddah

An efficient job management of computing service using integrated idle VM resources for high-performance computing based on OpenStack

Assessing the Impact of Training on Staff Performance: Evidence from Ghana Health Service in the Kumasi Metropolis

MR-Advisor: A comprehensive tuning, profiling, and prediction tool for MapReduce execution frameworks on HPC clusters

Hybrid Job-Driven Scheduling for Virtual MapReduce Clusters

Performance Optimization for Short Job Execution in Hadoop MapReduce

Job completion time on a virtualized server with software rejuvenation

SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters

Cloud Scheduling Optimization: A Reactive Model to Enable Dynamic Deployment of Virtual Machines Instantiations

Research on the Solution to Grid Data Replication Management in Distributed Environment

Prediction of resource requirement using feedback on job execution performance

Enhancing Availability of Grid Computational Services to Ubiquitous Computing Applications

Selfish Grids: Game-Theoretic Modeling and NAS/PSA Benchmark Evaluation

A resource management and fault tolerance services in grid computing

Dynamic cluster resource allocations for jobs with known and unknown memory demands