An Efficient Approach for Storage of Big Data Streams in Distributed Stream Processing Systems

Sultan Alshamrani,Wael Alosaimi,Quadri Waseem,Hamza Turabieh,Hashem Alyami,Abdullah Alharbi

doi:10.14569/ijacsa.2020.0110514

Abstract

Besides, centralized managing, processing and querying, the storage is one of the important components of a big data management. There is always a huge requirement of storing immense volumes of heterogeneous data in different formats. In big data steam processing applications, the storage is given a priority and always plays a big role in historical data analysis. During stream processing, some of the incoming data and the intermediate results are always a good source of future samples. These samples can be used for the future evaluation to eliminate the numerous mistakes of storing and maintaining the big data streams. Hence, a big data stream application requires an efficient support for storage of historical queries. The researchers, scientist and academicians are working hard to develop a sophisticated mechanism that is needed for storage to keep the most useful data for the future references by means of stream archive storage. However, a stream processing system can’t store the whole incoming stream data for future references. A technique is needed to get rid of the expired data and free the space for more incoming data in an archive storage. Hence keeping in view, the storage space limitation, integration issues and its associated cost, we try to optimize the stream archive storage and free more space for future data. The proposed enhanced algorithm will help to delete the obsolete data (retention or expired) and free the space for the new incoming data in a distributed platform. Our paper presents an Enhanced Time Expired Algorithm (ETEA) for stream archived storage in a distributed environment for removing the obsolete data based on time expiration and providing a space for the new incoming data for historical data analysis during the skew time (Hot Spots).We also evaluated the efficiency of our algorithm using the skew factor. The experimental results show that our approach is 98% efficient and fast than other conventional techniques.

Highlights

Big data management is a way of centralized storing, managing, processing and querying the huge volume of different available data in numerous formats [1,2,3,4,5]
This paper makes the following contributions as 1) We discuss some of the open issues related to storage of big data streams in distributed stream processing systems and elaborates storage optimization for archived data in a distributed streams databases (DSDBM’s) (Mentioned in Background Section)
Conventional methods like First in First Out (FIFO) and other related solutions are less efficient to provide the best solution when the stream databases are under skew time (Hot tuples-when most of the users and servers are on max utilization)

Summary

INTRODUCTION

Big data management is a way of centralized storing, managing, processing and querying the huge volume of different available data in numerous formats [1,2,3,4,5]. [9], and the Non-Batch processing includes the real-time OLTP online transaction processing database management DBMS systems They possess variable workloads, spike in traffic and are always dependent on shared nothing architecture besides using the main memory for the processing and scalability. Our proposed algorithm will provide a solution for the space limitation for the stream archive storage by detecting and deleting the retention data and free the space for the new incoming stream data without adding more storage externally .The algorithm will be beneficial in a way to save the cost of extra storage and its associated issues of integrating. This paper makes the following contributions as 1) We discuss some of the open issues related to storage of big data streams in distributed stream processing systems and elaborates storage optimization for archived data in a distributed streams databases (DSDBM’s) (Mentioned in Background Section). The rest of this paper is organized as follows: In Section 2, is a background, Section 3 is the literature review, Section 4 is related works, Section 5 is the introduction of our proposed algorithm, Section 6 is our detailed algorithm (Enhanced Time Expired Algorithm (ETEA), followed by Section 7 will depict our evaluation and Section 8 which gives our conclusion and future works

BACKGROUND

Need of Historic Data

Open Issues of Storage Optimization

LITERATURE REVIEW

RELATED WORKS

Resource Management

Storage Optimization

Problem Defnition

Our Contribution

ALGORITHM DETAILS

PERFORMANCE EVALUATION

Results

VIII. CONCLUSION AND FUTURE WORK

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Efficient Approach for Storage of Big Data Streams in Distributed Stream Processing Systems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2020
License type: cc-by

Similar Papers

Priority-Based Resource Scheduling in Distributed Stream Processing Systems for Big Data Applications
Paolo Bellavista ... Andrea Reale
-
Paolo Bellavista, et. al.Paolo Bellavista ... Andrea Reale
01 Dec 2014
01 Dec 2014

When FPGA-Accelerator Meets Stream Data Processing in the Edge
Song Wu ... Haikun Liu
-
Song Wu, et. al.Song Wu ... Haikun Liu
01 Jul 2019
01 Jul 2019

DIsCO: DynamIc Data COmpression in Distributed Stream Processing Systems
Nikos Zacheilas ... Vana Kalogeraki
-
Nikos Zacheilas, et. al.Nikos Zacheilas ... Vana Kalogeraki
01 Jan 2017
01 Jan 2017

S2p: Provenance Research for Stream Processing System
Qian Ye ... Minyan Lu
Applied Sciences | VOL. 11
Qian Ye, et. al.Qian Ye ... Minyan Lu
15 Jun 2021
Applied Sciences | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Efficient Approach for Storage of Big Data Streams in Distributed Stream Processing Systems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications