Abstract

As the era of the Internet of Things (IoT) arrives, the colossal volume of data generated has far exceeded what conventional computer systems can handle and process. As a result, cloud computing has shown its promising future for storing and handling the enormous amount of data. The data availability is one of the most critical issues related to data management. Losing data, in particular for those important, could cause unrecoverable damage in many cases. Most cloud platforms maintain multiple copies of data to lower the risk of losing data from hardware failure or natural disasters. Hadoop is one of the most popular cloud platforms in the cloud community. Its default file system, Hadoop Distributed File System (HDFS), keeps multiple copies for every file. Moreover, HDFS provides users a snapshot function to make point-in-time copies of files in the file system for system recovery in the future. Usually files in snapshots are considered crucial to users, otherwise users would not even bother to keep them in snapshots. Ideally, important data files should have more replicas to increase their availability during the occurrence of hardware failure or disasters. However, the current snapshot scheme only records the contents of files at various snapshot moments and it does not affect their numbers of replicas kept in HDFS. It would be desirable if files in snapshots could have more copies so their data availability can be enhanced accordingly. Unfortunately, HDFS cannot achieve this in an efficient way. We improved and modified HDFS to automatically increase the numbers of replicas for files in snapshots. Consequently, the data availability of important files involved in snapshots can be bettered. The experimental results show that our design and implementation noticeably outperforms what the current HDFS can do in the course of taking snapshots and increasing the number of replicas for files in snapshots.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call