Data Processing Framework Using Apache and Spark Technologies in Big Data

Archana Singh,Mamta Mittal,Namita Kapoor

doi:10.1007/978-981-13-0550-4_5

Abstract

The worldwide usage of Internet has been generating data exponentially. Internet has re-evolved business operations and its number of consumers. The data generation begins with the fact that there is vast information to capture and store. The rate of mounting of data on the Internet was one of the important factors in giving rise to the concept of big data. However, it is related to Internet but its existence is due to growing unstructured data which requires management. Organization stores this data in warehouses for future analysis. Besides storage, the organization also needs to clean, re-format and then use some data processing frameworks for data analysis and visualization. Hadoop MapReduce and Apache Spark are among various data processing and analysis frameworks. In this chapter, data processing frameworks Hadoop MapReduce and Apache Spark are used and the comparison between them is shown in terms of data processing parameters as memory, CPU, latency, and query performance.

Full Text