Enabling top-n file retrieval in cloud storage using hadoop distributed file system

J Jospin Jeya,E Kannan

doi:10.1109/iconstem.2016.7560944

Abstract

A storage system in cloud is well thought-out as a very big scale storage system that has independent storage servers. The service that cloud storage provides is, that can store user's data from remote through network and other authenticated users can access the data easily. Hadoop distributed file system is used to store large files consistently and to retrieve those files at very high bandwidth to user applications. Hadoop splits the files into large blocks and distributes them amongst the nodes in the cluster. When we retrieve data from the cloud, it is very important that the computation and communication overhead should be reduced. To reduce the communication overhead the server should send only the top-n files based on the keyword when the user asks for the data files. Since the owner need not maintain the copy of the files, it is all the more necessary to make check on the files available and also check the originality of the files stored in the server periodically. In HDFS the computation is done in parallel so that the execution time is drastically reduced. In the proposed system for retrieving top-n files we use Hadoop Distributed File System, so that the search time and the communication overhead is greatly reduced.

Full Text