Docker environment based Apache Storm and Spark Benchmark Test

Jiwon Bang,Mi-Jung Choi

doi:10.23919/apnoms50412.2020.9237049

Abstract

With the development of various technologies such as high-speed Internet and SNS dissemination, there have been many fields that require processing of big data generated in real time. Accordingly, real-time streaming data processing technology has been developed, and representative platforms include Apache Storm, Apache Spark, and Hadoop. These processing technologies provide scalability to configure distributed systems using multiple servers because they vary in performance, such as throughput and processing speed, depending on the server environment, but the more the number of servers, the more difficult it is to manage. To solve this problem, a problem can be solved by using a docker, a kind of virtualization system that provides ease of expansion. However, there is a place to maintain a native environment without using Docker due to the problem that performance may be reduced, which is a disadvantage of all virtualization systems. In this paper, we build Apache Storm and Apache Spark, which are real-time data processing systems in Docker and Native environments and conduct performance measurements through experiments processing JSON-format data to verify how much performance decreases in Docker environments.

Full Text