Abstract

In order to meet the requirements of users in terms of speed, capacity, storage efficiency, and security, with the goal of improving data redundancy and reducing data storage space, an unbalanced big data compatible cloud storage method based on redundancy elimination technology is proposed. A new big data acquisition platform is designed based on Hadoop and NoSQL technologies. Through this platform, efficient unbalanced data acquisition is realized. The collected data are classified and processed by classifier. The classified unbalanced big data are compressed by Huffman algorithm, and the data security is improved by data encryption. Based on the data processing results, the big data redundancy processing is carried out by using the data deduplication algorithm. The cloud platform is designed to store redundant data in the cloud. The results show that the method in this paper has high data deduplication rate and data deduplication speed rate and low data storage space and effectively reduces the burden of data storage.

Highlights

  • In the big data environment, data security and privacy protection are facing great impact and challenges

  • It is found that the redundancy in the big data saved by the application system is as high as 60%, and the redundancy increases with the passage of time. e traditional data storage technology reduces the redundancy of data through coding mapping according to the internal relationship of data, so as to increase the data density and reduce the space occupied by data [2, 3]

  • Aiming at the above problems, this paper explores a nonequilibrium big data compatibility cloud storage method based on redundancy elimination technology

Read more

Summary

Introduction

In the big data environment, data security and privacy protection are facing great impact and challenges. The above methods improve the efficiency of data storage to a certain extent, there are some problems such as poor redundancy elimination effect and high data storage space. (1) Based on Hadoop and NoSQL, a data collection platform was designed, and multiple concurrent data collection function modules were opened on multiple machines at the same time, which improved the data collection efficiency of the entire platform (2) Use the Huffman algorithm to compress the data, which can significantly reduce the storage space of the data and improve the data query speed in the storage mode (3) Redundancy elimination algorithm in nonequilibrium big data encryption technology is for big data elimination, detecting duplicate data objects in the data flow according to redundancy, transmitting and storing only unique copies of data objects, and replacing other duplicates with the unique data object copy, so as to eliminate the same files or data blocks in the big data set, effectively reduce the storage space of big data, and reduce the amount of data transmitted by the network

Unbalanced Big Data Compatibility Cloud Storage Method
Simulation Experiment
Analysis of Experimental Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call