Survey on Scientific Data Processing Using Hadoop MapReduce in Cloud Environments

Xiangming Kong

doi:10.1109/icee.2012.1243

Abstract

High-performance processing scientific data has enabled the development of digital resources and web-based services that facilitate uses of data beyond those that may have been envisioned by the original data producers. Scientific data processing systems must handle scientific data coming from real time, high-throughput applications. Timely processing of scientific data is important and requires sufficient available resources to achieve high throughput and deliver accurate output results. Cloud Computing provides a low-priced way for small and medium sized enterprises to process scientific data. Based on Hadoop MapReduce of cloud computing, we propose the detailed procedure of scientific data processing algorithm which can improve the overall performance under the shared environment while retaining compatibility with the native Hadoop MapReduce in this paper.

Full Text