Simulation of Performance Analysis of MongoDB, PIG, HIVE Storage, Map Reduce, Spark and Yarn

Monika Monu,Sat Pal

doi:10.2139/ssrn.3365403

Abstract

Nowadays there are a variety of the size or volume, complexity, variety, rate of growth or veracity of information. The companies have achieved an outstanding stage in order to handle the data. The cause is that the traditional techniques and analytical devices have failed to do this job. Big Data is always increasing rapidly. It is not possible to determine with respect to its size. Hadoop is capable to evaluate the big size data. Hadoop has been considered a framework. It has been applied to process the big data sets across numerous clusters. The Tools Hadoop, Map Reduce etc. are capable to manage this huge amount of data are. Along with this the Apache Hive, No SQL are also this kind of tolls. Information extraction has been considered essential. Its cause is that there is rapid growth of unstructured text data. Thus, it has been considered a computationally intensive and MapReduce and parallel database management systems. These are applied to evaluate the huge size of information. This paper has familiarized big data tools such as pache hive and Apache pig. here the comparison of hive and pig has been made based on some parameters. After making comparison it has been come to know that the hive performs better as compare to pig. Major difference in Hadoop MapReduce and Spark lies in way of processing. Spark is capable to do it in-memory. However, Hadoop MapReduce need to read from and write to the disk. Thus, the speed of processing is different. Spark is 100 times faster as compare to MapReduce

Full Text