Abstract

An effective technique to process and analyse large amounts of data is achieved through using the MapReduce framework. It is a programming model which is used to rapidly process vast amount of data in parallel and distributed mode operating on a large cluster of machines. Hadoop, an open-source implementation, is an example of MapReduce for writing and running MapReduce applications. The problem is to specify, which computing environment improves the performance of MapReduce to process large amounts of data? A standalone and cloud computing implementation are used for the experiment to evaluate whether the performance of running MapReduce system in cloud computing mode is better than in stand-alone mode or not, with respect to the speed of processing, response time and cost efficiency. This comparison uses different sizes of dataset to show the functionality of MapReduce to process large datasets in both modes. The finding is, running a MapReduce program to process and analysis of large datasets in a cloud computing environment is more efficient than running in a stand-alone mode.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.