Abstract

Scientific computation and data intensive analyses are ever more frequent. On the one hand, the MapReduce programming model has gained a lot of attention for its applicability in large parallel data analyses and Big Data applications. On the other hand, Cloud computing seems to be increasingly attractive in solving these computing problems that demand a lot of resources. This paper explores the potential symbiosis between MapReduce and Cloud Computing, in order to create a robust and scalable environment to execute MapReduce workflows regardless of the underlaying infrastructure. The main goal of this work is to provide an easy-to-install interface, so as non-expert scientists can deploy a suitable testbed for their MapReduce experiments on local resources of their institution. Testing cases were performed in order to evaluate the required time for the whole executing process on a real cluster.

Highlights

  • Scientific Computing enables to perform new kind of experiments that would have been impossible only a decade ago

  • The MapReduce programming model abstracts the common difficulties linked to distributed processing on large clusters, by offering a simple and efficient way of processing large data sets with a parallel distributed algorithm

  • It has been argued that MapReduce does not suit well for many scientific algorithms, a recent work [2] studied how to adapt different classes of algorithms into the MapReduce model and concluded that the MapReduce programming model can be used successfully even for solving complex scientific computing problems

Read more

Summary

Introduction

Scientific Computing enables to perform new kind of experiments that would have been impossible only a decade ago. Big Data science is generating datasets that are increasing exponentially in both complexity and volume, making their analysis a big challenge. Two issues should be addressed: finding an effective method to tackle such challenging problems, and obtaining the necessary resources to solve them. MapReduce [1] may help in addressing the first issue. The MapReduce programming model abstracts the common difficulties linked to distributed processing on large clusters, by offering a simple and efficient way of processing large data sets with a parallel distributed algorithm. It has been argued that MapReduce does not suit well for many scientific algorithms, a recent work [2] studied how to adapt different classes of algorithms into the MapReduce model and concluded that the MapReduce programming model can be used successfully even for solving complex scientific computing problems

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.