Big Data Analysis with Dataset Scaling in Yet Another Resource Negotiator (YARN)

Ashima Singh,Gurpreet Singhbedi

doi:10.5120/16009-5051

Abstract

data is exceedingly large day by day. In some organizations, there is a need to analyze and process the gigantic data. This is a big data problem often faced by these organizations. It is not possible for single machine to handle that data. So we have used Apache Hadoop Distributed File System (HDFS) for storage and analysis. This paper shows experimental work done on the MapReduce Application on Health sector dataset. The result shows the behavior of the MapReduce application framework to map and reduce the big volume of the data. The main problem is to check the behavior of the MapReduce applications by increasing the size of dataset. Our analysis lies in understanding the Apache MapReduce application performance. We expect that execution time increases linearly with the dataset size but our analysis shows sometimes the execution time varies non- linearly with the increase in the dataset size. The experimental result shows that with scaling the datasets execution time distinguishes. KeywordsData, Hadoop, MapReduce, YARN, Single Node, Multi

Full Text