Big Data Velocity Management–From Stream to Warehouse via High Performance Memory Optimized Index Join

M Asif Naeem,Farhaan Mirza,Noreen Jamil,Gerald Weber,David Sundaram,Habib Ullah Khan

doi:10.1109/access.2020.3033464

Abstract

Efficient resource optimization is critical to manage the velocity and volume of real-time streaming data in near-real-time data warehousing and business intelligence. This article presents a memory optimisation algorithm for rapidly joining streaming data with persistent master data in order to reduce data latency. Typically during the transformation phase of ETL (Extraction, Transformation, and Loading) a stream of transactional data needs to be joined with master data stored on disk. To implement this process, a semi-stream join operator is commonly used. Most semi-stream join operators cache frequent parts of the master data to improve their performance, this process requires careful distribution of allocated memory among the components of the join operator. This article presents a cache inequality approach to optimise cache size and memory. To test this approach, we present a novel Memory Optimal Index-based Join (MOIJ) algorithm. MOIJ supports many-to-many types of joins and adapts to dynamic streaming data. We also present a cost model for MOIJ and compare the performance with existing algorithms empirically as well as analytically. We envisage the enhanced ability of processing near-real-time streaming data using minimal memory will reduce latency in processing big data and will contribute to the development of high-performance real-time business intelligence systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 29	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

Big Data Velocity Management–From Stream to Warehouse via High Performance Memory Optimized Index Join

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Big data processing and analysis platform for condition monitoring of electric power system
Yuanjun Guo ... Yong Wang
-
Yuanjun Guo, et. al.Yuanjun Guo ... Yong Wang
01 Aug 2016
01 Aug 2016

On the Research of Big Data Storage
H.F Qin ... Z.M Qian
-
H.F Qin, et. al.H.F Qin ... Z.M Qian
01 Jan 2015
01 Jan 2015

A Multi-way Semi-stream Join for a Near-Real-Time Data Warehouse
M Asif Naeem ... Gerald Weber
-
M Asif Naeem, et. al.M Asif Naeem ... Gerald Weber
01 Jan 2017
01 Jan 2017

Cloud computing and big data: Technologies and applications
Mostapha Zbakh ... Mohamed Essaaidi
Concurrency and Computation: Practice and Experience | VOL. 29
Mostapha Zbakh, et. al.Mostapha Zbakh ... Mohamed Essaaidi
29 Mar 2017
Concurrency and Computation: Practice and Experience | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Big Data Velocity Management–From Stream to Warehouse via High Performance Memory Optimized Index Join

Abstract

Talk to us

Similar Papers

More From: IEEE Access