Abstract

A storage system in cloud is well thought-out as a very big scale storage system that has independent storage servers. The service that cloud storage provides is, that can store user's data from remote through network and other authenticated users can access the data easily. Hadoop distributed file system is used to store large files consistently and to retrieve those files at very high bandwidth to user applications. Hadoop splits the files into large blocks and distributes them amongst the nodes in the cluster. When we retrieve data from the cloud, it is very important that the computation and communication overhead should be reduced. To reduce the communication overhead the server should send only the top-n files based on the keyword when the user asks for the data files. Since the owner need not maintain the copy of the files, it is all the more necessary to make check on the files available and also check the originality of the files stored in the server periodically. In HDFS the computation is done in parallel so that the execution time is drastically reduced. In the proposed system for retrieving top-n files we use Hadoop Distributed File System, so that the search time and the communication overhead is greatly reduced.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.