Evaluation of Data Processing Using MapReduce Framework in Cloud and Stand - Alone Computing

Samira Daneshyar

doi:10.5121/ijdps.2012.3605

Abstract

An effective technique to process and analyse large amounts of data is achieved through using the MapReduce framework. It is a programming model which is used to rapidly process vast amount of data in parallel and distributed mode operating on a large cluster of machines. Hadoop, an open-source implementation, is an example of MapReduce for writing and running MapReduce applications. The problem is to specify, which computing environment improves the performance of MapReduce to process large amounts of data? A standalone and cloud computing implementation are used for the experiment to evaluate whether the performance of running MapReduce system in cloud computing mode is better than in stand-alone mode or not, with respect to the speed of processing, response time and cost efficiency. This comparison uses different sizes of dataset to show the functionality of MapReduce to process large datasets in both modes. The finding is, running a MapReduce program to process and analysis of large datasets in a cloud computing environment is more efficient than running in a stand-alone mode.

Full Text