Abstract

The worldwide usage of Internet has been generating data exponentially. Internet has re-evolved business operations and its number of consumers. The data generation begins with the fact that there is vast information to capture and store. The rate of mounting of data on the Internet was one of the important factors in giving rise to the concept of big data. However, it is related to Internet but its existence is due to growing unstructured data which requires management. Organization stores this data in warehouses for future analysis. Besides storage, the organization also needs to clean, re-format and then use some data processing frameworks for data analysis and visualization. Hadoop MapReduce and Apache Spark are among various data processing and analysis frameworks. In this chapter, data processing frameworks Hadoop MapReduce and Apache Spark are used and the comparison between them is shown in terms of data processing parameters as memory, CPU, latency, and query performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call