Beyond Batch Processing: Towards Real-Time and Streaming Big Data

Saeed Shahrivari

doi:10.3390/computers3040117

Abstract

Today, big data are generated from many sources, and there is a huge demand for storing, managing, processing, and querying on big data. The MapReduce model and its counterpart open source implementation Hadoop, has proven itself as the de facto solution to big data processing, and is inherently designed for batch and high throughput processing jobs. Although Hadoop is very suitable for batch jobs, there is an increasing demand for non-batch requirements like: interactive jobs, real-time queries, and big data streams. Since Hadoop is not suitable for these non-batch workloads, new solutions are proposed to these new challenges. In this article, we discussed two categories of these solutions: real-time processing, and stream processing of big data. For each category, we discussed paradigms, strengths and differences to Hadoop. We also introduced some practical systems and frameworks for each category. Finally, some simple experiments were performed to approve effectiveness of new solutions compared to available Hadoop-based solutions.

Highlights

The ―Big Data‖ paradigm has experienced expanding popularity recently
Solutions in this sector can be classified into two major categories: (i) Solutions that try to reduce the overhead of MapReduce and make it faster to enable execution of jobs in less than seconds; (ii) Solutions that focus on providing a means for real-time queries over structured and unstructured big data using new optimized approaches
For the case of real-time queries over big data, a comprehensive benchmark is done by the Berkeley AMP Lab [29]

Summary

Introduction

The ―Big Data‖ paradigm has experienced expanding popularity recently. The ―Big Data” term is generally used for datasets which are so huge that they cannot be processed and managed using classical solutions like Relational Data Base Systems (RDBMS). The most notable solution that is proposed for managing and processing big data is the MapReduce framework which has been initially introduced and used by Google [4]. MapReduce is designed for batch processing of large volumes of data, and it is not suitable for recent demands like real-time and online processing. We give a brief survey with focus on two new aspects: real-time processing and stream processing solutions for big data. There are numerous use cases for stream processing like: online machine learning, and continuous computation. These new trends need systems that are more elaborate and agile than the currently available MapReduce solutions like the Hadoop framework.

The MapReduce Framework

Apache Hadoop

MapReduce Extensions

Other Models

Real-Time Big Data Processing

In-Memory Computing

Real-Time Queries over Big Data

Streaming Big Data

Experimental Results

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computers	Publication Date: Oct 17, 2014
Citations: 113	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Beyond Batch Processing: Towards Real-Time and Streaming Big Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computers

Lead the way for us

Similar Papers

Real-Time Processing of Big Data Streams: Lifecycle, Tools, Tasks, and Challenges
Fatih Gurcan ... Muhammet Berigel
-
Fatih Gurcan, et. al.Fatih Gurcan ... Muhammet Berigel
01 Oct 2018
01 Oct 2018

Investigation on Processing of Real-Time Streaming Big Data
Bhavani Buthukuri ... Sivaram Rajeyyagari
International Journal of Engineering & Technology | VOL. 7
Bhavani Buthukuri, et. al.Bhavani Buthukuri ... Sivaram Rajeyyagari
27 Jul 2018
International Journal of Engineering & Technology | VOL. 7

Processing and Analytics of Big Data Streams with Yahoo!S4
Fatos Xhafa ... Santi Caballe
-
Fatos Xhafa, et. al.Fatos Xhafa ... Santi Caballe
01 Mar 2015
01 Mar 2015

A Review on Big Data Stream Processing Applications: Contributions, Benefits, and Limitations
Shaimaa Safaa Ahmed Alwaisi ... Luma Fayeq Jalil
JOIV : International Journal on Informatics Visualization | VOL. 5
Shaimaa Safaa Ahmed Alwaisi, et. al.Shaimaa Safaa Ahmed Alwaisi ... Luma Fayeq Jalil
31 Dec 2021
JOIV : International Journal on Informatics Visualization | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Beyond Batch Processing: Towards Real-Time and Streaming Big Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computers