Real-Time Processing of Big Data Streams: Lifecycle, Tools, Tasks, and Challenges

Fatih Gurcan,Muhammet Berigel

doi:10.1109/ismsit.2018.8567061

Abstract

In today's technological environments, the vast majority of big data-driven applications and solutions are based on real-time processing of streaming data. The real-time processing and analytics of big data streams play a crucial role in the development of big-data driven applications and solutions. From this perspective, this paper defines a lifecycle for the real-time big data processing. It describes existing tools, tasks, and frameworks by associating them with the phases of the lifecycle, which include data ingestion, data storage, stream processing, analytical data store, and analysis and reporting. The paper also investigates the real-time big data processing tools consisting of Flume, Kafka, Nifi, Storm, Spark Streaming, S4, Flink, Samza, Hbase, Hive, Cassandra, Splunk, and Sap Hana. As well as, it discusses the up-to-date challenges of the real-time big data processing such as “volume, variety and heterogeneity”, “data capture and storage”, “inconsistency and incompleteness”, “scalability”, “real-time processing”, “data visualization”, “skill requirements”, and “privacy and security”. This paper may provide valuable insights into the understanding of the lifecycle, related tools and tasks, and challenges of real-time big data processing.

Full Text