Abstract
Abstract. Various sensors from airborne and satellite platforms are producing large volumes of remote sensing images for mapping, environmental monitoring, disaster management, military intelligence, and others. However, it is challenging to efficiently storage, query and process such big data due to the data- and computing- intensive issues. In this paper, a Hadoop-based framework is proposed to manage and process the big remote sensing data in a distributed and parallel manner. Especially, remote sensing data can be directly fetched from other data platforms into the Hadoop Distributed File System (HDFS). The Orfeo toolbox, a ready-to-use tool for large image processing, is integrated into MapReduce to provide affluent image processing operations. With the integration of HDFS, Orfeo toolbox and MapReduce, these remote sensing images can be directly processed in parallel in a scalable computing environment. The experiment results show that the proposed framework can efficiently manage and process such big remote sensing data.
Highlights
Big Data, referring to the enormous volume, velocity, and variety of data (NIST Cloud/BigData Workshop, 2014), has become one of the biggest technology shifts in in the 21st century (Mayer-Schönberger and Cukier, 2013)
The RS image processing reads these data into memory first for further analysis, so the data I/O has become the bottleneck for high-performance computing (HPC) to process RS images
Hadoop is an open-source software framework for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. It is composed of Hadoop Common, Hadoop Distributed File System, Hadoop YARN and Hadoop MapReduce
Summary
Big Data, referring to the enormous volume, velocity, and variety of data (NIST Cloud/BigData Workshop, 2014), has become one of the biggest technology shifts in in the 21st century (Mayer-Schönberger and Cukier, 2013). Hadoop is an open-source software framework for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. It is composed of Hadoop Common, Hadoop Distributed File System, Hadoop YARN and Hadoop MapReduce. To address the challenges posed by processing big RS data, this paper proposes a Hadoop-based distributed framework to efficiently manage and process big RS image data. This framework distributes RS images among the nodes in a cluster. By integrating the functions in OTB libraries into MapReduce, these RS images can be directly processed in parallel
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have