Abstract

Hadoop YARN is a widely used distributed computing framework mainly used for big data processing. Yet Another Resource Negotiator (YARN) was introduced in Hadoop 2.0 to solve the scalability issues of the earlier realization. Hadoop YARN in combination with Hadoop Distributed File System (HDFS) can be considered as a distributed operating system with capabilities similar to that of the proprietary Tandem Nonstop kernel. Now it is possible to use YARN as a general purpose framework for distributed computing. When a job is divided into tasks and distributed among the nodes of a cluster and further distributed on different CPU cores of the nodes, parallelism in the execution of tasks plays a major role in the completion time of jobs. In distributed computing frameworks, a global scheduler distributes the tasks of the workloads among the nodes and the local schedulers manages the tasks submitted to the nodes. Hadoop YARN partitions the system resources into containers and launches the tasks in them. YARN does not provide a single configuration parameter to define the number of containers that will be deployed concurrently on the nodes of the cluster. The number of concurrent containers depends on the fraction of the resources allocated to them. If the resources are divided with coarse granularity, the number of containers that can run in parallel will be limited. If the resources are divided with fine granularity, a larger number of containers can be run in parallel. When a Hadoop job is deployed on cloud platforms, it is required to select adequate resources for the platform to meet the deadlines in execution time. To guarantee maximum performance with minimum resources, we need to configure the YARN framework for optimum concurrency level in execution. This paper studies how to control the parallelism in execution of tasks by controlling the number of concurrent containers and how the execution time of jobs depends on the concurrency level of tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call