Abstract

Abstract. Various sensors from airborne and satellite platforms are producing large volumes of remote sensing images for mapping, environmental monitoring, disaster management, military intelligence, and others. However, it is challenging to efficiently storage, query and process such big data due to the data- and computing- intensive issues. In this paper, a Hadoop-based framework is proposed to manage and process the big remote sensing data in a distributed and parallel manner. Especially, remote sensing data can be directly fetched from other data platforms into the Hadoop Distributed File System (HDFS). The Orfeo toolbox, a ready-to-use tool for large image processing, is integrated into MapReduce to provide affluent image processing operations. With the integration of HDFS, Orfeo toolbox and MapReduce, these remote sensing images can be directly processed in parallel in a scalable computing environment. The experiment results show that the proposed framework can efficiently manage and process such big remote sensing data.

Highlights

  • Big Data, referring to the enormous volume, velocity, and variety of data (NIST Cloud/BigData Workshop, 2014), has become one of the biggest technology shifts in in the 21st century (Mayer-Schönberger and Cukier, 2013)

  • The RS image processing reads these data into memory first for further analysis, so the data I/O has become the bottleneck for high-performance computing (HPC) to process RS images

  • Hadoop is an open-source software framework for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. It is composed of Hadoop Common, Hadoop Distributed File System, Hadoop YARN and Hadoop MapReduce

Read more

Summary

1.! INTRODUCTION

Big Data, referring to the enormous volume, velocity, and variety of data (NIST Cloud/BigData Workshop, 2014), has become one of the biggest technology shifts in in the 21st century (Mayer-Schönberger and Cukier, 2013). Hadoop is an open-source software framework for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. It is composed of Hadoop Common, Hadoop Distributed File System, Hadoop YARN and Hadoop MapReduce. To address the challenges posed by processing big RS data, this paper proposes a Hadoop-based distributed framework to efficiently manage and process big RS image data. This framework distributes RS images among the nodes in a cluster. By integrating the functions in OTB libraries into MapReduce, these RS images can be directly processed in parallel

2.! RELATED WORKS
Data Management
Data Partition Period
Map Period
Cluster Environment
5.! CONCLUSION & DISCUSSION
Experiment Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call