Maintaining Data Integrity in Cloud Systems through Version Management

Yiming Tu,Yipin Wang,Tsozen Yeh

doi:10.1504/ijahuc.2020.10029486

Abstract

As the era of the big data arrives, the enormous amount of data collected has far exceeded what traditional computer systems can appropriately handle and process. Accordingly, cloud computing has been largely used to facilitate the processing of big data. Often individual data files contain data inserted at different time, which means they have chronological versions of contents since their creation. Hadoop is one of the most popular cloud systems used nowadays. Unfortunately, it does not support efficient schemes to conduct version management for files. Previously, we improved Hadoop by realising autonomous snapshot and extra duplication for files covered in snapshots. In this paper, we report our efforts to design and implement version management for files in snapshots. With the help of autonomous snapshot and extra file duplication, version management can further maintain data integrity for important files contained in snapshots.

Full Text