Mapping the Big Data Landscape: Technologies, Platforms and Paradigms for Real-Time Analytics of Data Streams

Timothee Dubuc,Etienne B Roesch,Frederic Stahl

doi:10.1109/access.2020.3046132

Timothee Dubuc, Etienne B Roesch + Show 1 more

Open Access

https://doi.org/10.1109/access.2020.3046132

Copy DOI

Abstract

The `Big Data' of yesterday is the `data' of today. As technology progresses, new challenges arise and new solutions are developed. Due to the emergence of Internet of Things applications within the last decade, the field of Data Mining has been faced with the challenge of processing and analysing data streams in real-time, and under high data throughput conditions. This is often referred to as the Velocity aspect of Big Data. Whereas there are numerous reviews on Data Stream Mining techniques and applications, there is very little work surveying Data Stream processing paradigms and associated technologies, from data collection through to pre-processing and feature processing, from the perspective of the user, not that of the service provider. In this article, we evaluate a particular type of solution, which focuses on streaming data, and processing pipelines that permit online analysis of data streams that cannot be stored as-is on the computing platform. We review foundational computational concepts such as distributed computation, fault-tolerant computing, and computational paradigms/architectures. We then review the available technological solutions, and applications that pertain to data stream mining as case studies of these theoretical concepts. We conclude with a discussion of the field of data stream processing/analytics, future directions and research challenges.

Highlights

Stemming from recent technological advancement, what came to be coined the ‘Data Era’ [1]–[3] is concurrent to a dramatic increase in the portability of computerized devices
COMPUTE PARADIGMS far, we have introduced the concepts of distributed computing, and Big Data appliances as a pool of computational power
The main conclusion that can be drawn from this review is that ‘Big Data’ and real-time analytics is a highly complex and interdisciplinary field that requires diverse expertise in network communication, IT infrastructure, storage, control and optimisation; this expertise is required even before one can begin to plan the types of processing and analytical pipelines that could yield return on investment from the data available

Summary

INTRODUCTION

Stemming from recent technological advancement, what came to be coined the ‘Data Era’ [1]–[3] is concurrent to a dramatic increase in the portability of computerized devices. Many solutions offer only slight variations to each other within the same processing paradigm, and are most often based on outdated scientific publications with limited relevance to the state of the art As they are initially created to answer a specific problem, these solutions have their own innovations and operative modes (Apache Kafka/Samza by LinkedIn, OpenStack by Rackspace Hosting and NASA, Apache FlumeJava/Millwheel/Beam by Google, etc), but as they are developed further, they may be extended to operate outside their original specifications, regardless of whether they can excel and respond to the specific needs of their creators in their modified state.

COMPUTATIONAL CONCEPTS

FULL STACK AND CLOUD COMPUTING

Findings

CONCLUDING REMARKS

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Dec 21, 2020
Citations: 91	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Mapping the Big Data Landscape: Technologies, Platforms and Paradigms for Real-Time Analytics of Data Streams

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Solving the challenges of concept drift in data stream classification.
Hanqing Hu
-
Hanqing HuHanqing Hu
14 Oct 2022
14 Oct 2022

A COMPREHENSIVE REVIEW ON DATA STREAM MINING TECHNIQUES FOR DATA CLASSIFICATION; AND FUTURE TRENDS
Faisal Ramzan ... Muawaz Ayyaz
EPH - International Journal of Science And Engineering | VOL. 9
Faisal Ramzan, et. al.Faisal Ramzan ... Muawaz Ayyaz
11 Aug 2023
EPH - International Journal of Science And Engineering | VOL. 9

A Frequent Pattern Conjunction Heuristic for Rule Generation in Data Streams
Frederic Stahl ... Thien Le
Information | VOL. 12
Frederic Stahl, et. al.Frederic Stahl ... Thien Le
09 Jan 2021
Information | VOL. 12

Improvement of Big Data Stream Mining Technique for Automatic Bone Age Assessment
Ari Wibisono ... Petrus Mursanto
-
Ari Wibisono, et. al.Ari Wibisono ... Petrus Mursanto
20 Nov 2019
20 Nov 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Mapping the Big Data Landscape: Technologies, Platforms and Paradigms for Real-Time Analytics of Data Streams

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access