Abstract

Since Landsat-1 first started to deliver volumes of pixels in 1972, the volumes of archived data in remote sensing data centers have increased continuously. Due to various satellite orbit parameters and the specifications of different sensors, the storage formats, projections, spatial resolutions, and revisit periods of these archived data are vastly different. In addition, the remote sensing data received continuously by each data center arrives at a faster code rate; it is best to ingest and archive the newly received data to ensure users have access to the latest data retrieval and distribution services. Hence, an excellent data integration, organization, and management program is urgently needed. However, the multi-source, massive, heterogeneous, and distributed storage features of remote sensing data have not only caused difficulties for integration across distributed data center spatial infrastructures, but have also resulted in the current modes of data organization and management being unable meet the rapid retrieval and access requirements of users. Hence, this paper proposes an object-oriented data technology (OODT) and SolrCloud-based remote sensing data integration and management framework across a distributed data center spatial infrastructure. In this framework, all of the remote sensing metadata in the distributed sub-centers are transformed into the International Standardization Organization (ISO) 19115-based unified format, and then ingested and transferred to the main center by OODT components, continuously or at regular intervals. In the main data center, in order to improve the efficiency of massive data retrieval, we proposed a logical segmentation indexing (LSI) model-based data organization approach, and took SolrCloud to realize the distributed index and retrieval of massive metadata. Finally, a series of distributed data integration, retrieval, and comparative experiments showed that our proposed distributed data integration and management program is effective and promises superior results. Specifically, the LSI model-based data organization and the SolrCloud-based distributed indexing schema was able to effectively improve the efficiency of massive data retrieval.

Highlights

  • Since Landsat-1 first started to deliver volumes of pixels in 1972, the amount of archived remote sensing data stored by data centers has increased continuously [1,2]

  • Case 1: The spatial and time query parameters remained. In this case: (a) when the amount of metadata was less than 7.5 million items, the time consumption of the logical segmentation indexing (LSI) model-based retrieval method was a little less than that of longitude- and latitude-based data retrieval; (b) with the increase of the metadata volume, the LSI model-based data retrieval was more efficient than the longitude- and latitude-based data retrieval; (c) when the amount of metadata was less than 5.5 million items, the time consumption of LSI model-based metadata retrieval on a single Solr node was not very different from that of SolrCloud; (d) when the metadata volume increased, the retrieval speed differences between

  • Case 2: The spatial query parameters remained but time frames changed. In this case: (a) with the increase of query time frames, the time consumed showed an upward trend as a whole, but this was not obvious, for SolrCloud and in the Solr single node—this type of situation could benefit from the inverted index of SolrCloud and Solr; and (b) the query time increased little with the increase of query time frames in the HBase cluster

Read more

Summary

Introduction

Since Landsat-1 first started to deliver volumes of pixels in 1972, the amount of archived remote sensing data stored by data centers has increased continuously [1,2]. The two most widely used data organization models are: (1) spatio-temporal recording system-based satellite orbit stripes or scene organization; and (2) globally meshed grid-based data tiling organization [8] The former has obvious shortcomings for massive data retrieval and quick access; and the latter causes an increase by about one-third in the amount of data due to image segmentation, requiring larger data storage spaces. LSI model takes the logical segmentation indexing code as the identifier of each remote sensing data, rather than performing an actual physical subdivision This increases the efficiency of data retrieval with the help of the global subdivision index, and avoids generating numerous small files caused by the physical subdivision of data. This paper is organized as follows: Section 2 provides an overview of the background knowledge and related work; Section 3 describes the distributed multi-source remote sensing metadata transformation and integration; Section 4 details the data management methods, including the LSI spatial organization model, full-text index construction, and distributed data retrieval; Section 5 introduces the experiments and provides an analysis of the proposed program; and Section 6 provides a summary and conclusions

Distributed Integration of Remote Sensing Data
Spatial Organization of Remote Sensing Data
OODT: A Data Integration Framework
Distributed Integration of Multi-Source Remote Sensing Data
The ISO 19115-Based Metadata Transformation
Distributed Multi-Source Remote Sensing Data Integration
Spatial Organization and Management of Remote Sensing Data
LSI Organization Model of Multi-Source Remote Sensing Data
Full-Text Index of Multi-Sourced Remote Sensing Metadata
Distributed Data Retrieval
Experiment and Analysis
Distributed Data Integration Experiment
LSI Model-Based Metadata Retrieval Experiment
GeoSOT Grids
Comparative Experiments and Analysis
Comparative Experiments
Results Analysis
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.