Abstract

Besides, centralized managing, processing and querying, the storage is one of the important components of a big data management. There is always a huge requirement of storing immense volumes of heterogeneous data in different formats. In big data steam processing applications, the storage is given a priority and always plays a big role in historical data analysis. During stream processing, some of the incoming data and the intermediate results are always a good source of future samples. These samples can be used for the future evaluation to eliminate the numerous mistakes of storing and maintaining the big data streams. Hence, a big data stream application requires an efficient support for storage of historical queries. The researchers, scientist and academicians are working hard to develop a sophisticated mechanism that is needed for storage to keep the most useful data for the future references by means of stream archive storage. However, a stream processing system can’t store the whole incoming stream data for future references. A technique is needed to get rid of the expired data and free the space for more incoming data in an archive storage. Hence keeping in view, the storage space limitation, integration issues and its associated cost, we try to optimize the stream archive storage and free more space for future data. The proposed enhanced algorithm will help to delete the obsolete data (retention or expired) and free the space for the new incoming data in a distributed platform. Our paper presents an Enhanced Time Expired Algorithm (ETEA) for stream archived storage in a distributed environment for removing the obsolete data based on time expiration and providing a space for the new incoming data for historical data analysis during the skew time (Hot Spots).We also evaluated the efficiency of our algorithm using the skew factor. The experimental results show that our approach is 98% efficient and fast than other conventional techniques.

Highlights

  • Big data management is a way of centralized storing, managing, processing and querying the huge volume of different available data in numerous formats [1,2,3,4,5]

  • This paper makes the following contributions as 1) We discuss some of the open issues related to storage of big data streams in distributed stream processing systems and elaborates storage optimization for archived data in a distributed streams databases (DSDBM’s) (Mentioned in Background Section)

  • Conventional methods like First in First Out (FIFO) and other related solutions are less efficient to provide the best solution when the stream databases are under skew time (Hot tuples-when most of the users and servers are on max utilization)

Read more

Summary

INTRODUCTION

Big data management is a way of centralized storing, managing, processing and querying the huge volume of different available data in numerous formats [1,2,3,4,5]. [9], and the Non-Batch processing includes the real-time OLTP online transaction processing database management DBMS systems They possess variable workloads, spike in traffic and are always dependent on shared nothing architecture besides using the main memory for the processing and scalability. Our proposed algorithm will provide a solution for the space limitation for the stream archive storage by detecting and deleting the retention data and free the space for the new incoming stream data without adding more storage externally .The algorithm will be beneficial in a way to save the cost of extra storage and its associated issues of integrating. This paper makes the following contributions as 1) We discuss some of the open issues related to storage of big data streams in distributed stream processing systems and elaborates storage optimization for archived data in a distributed streams databases (DSDBM’s) (Mentioned in Background Section). The rest of this paper is organized as follows: In Section 2, is a background, Section 3 is the literature review, Section 4 is related works, Section 5 is the introduction of our proposed algorithm, Section 6 is our detailed algorithm (Enhanced Time Expired Algorithm (ETEA), followed by Section 7 will depict our evaluation and Section 8 which gives our conclusion and future works

BACKGROUND
Need of Historic Data
Open Issues of Storage Optimization
LITERATURE REVIEW
RELATED WORKS
Resource Management
Storage Optimization
Problem Defnition
Our Contribution
ALGORITHM DETAILS
PERFORMANCE EVALUATION
Results
VIII. CONCLUSION AND FUTURE WORK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call