Abstract

Cloud computing provides a proper platform for hosting large-scale data-intensive applications. MapReduce is a programming model as well as a framework that supports the model. The main idea of the MapReduce model is to hide details of parallel execution and allow users to focus only on data processing strategies. Hadoop is an open-source implementation for MapReduce. For storage and analysis of online or streaming data which is big in size. Most organization are moving toward Apaches Hadoop HDFS. Applications like log processors, search engines etc. ueses hadoop Map reduce for computing and HDFS for storage. Hadoop is popular for analysis, storage and processing of very large data but require to make changes in hadoop system. There is no mechanism to identify duplicate computations which increase processing time and unnecessary data transmission .To co-locate related files by considering content and using locality sensitive hashing algorithm. By storing related files in same cluster using cache mechanism which improve data locality mechanism and avoids repeated execution of task, both helps to speed up execution of hadoop. Keywords-Distributed file system, Datanode, Locality Sensitive Hashing -------------------------------------------------------------------***-------------------------------------------------------------------

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call