Abstract

Remote sensing big data (RSBD) is generally characterized by huge volumes, diversity, and high dimensionality. Mining hidden information from RSBD for different applications imposes significant computational challenges. Clustering is an important data mining technique widely used in processing and analyzing remote sensing imagery. However, conventional clustering algorithms are designed for relatively small datasets. When applied to problems with RSBD, they are, in general, too slow or inefficient for practical use. In this paper, we proposed a parallel subsampling-based clustering (PARSUC) method for improving the performance of RSBD clustering in terms of both efficiency and accuracy. PARSUC leverages a novel subsampling-based data partitioning (SubDP) method to realize three-step parallel clustering, effectively solving the notable performance bottleneck of the existing parallel clustering algorithms; that is, they must cope with numerous repeated calculations to get a reasonable result. Furthermore, we propose a centroid filtering algorithm (CFA) to eliminate subsampling errors and to guarantee the accuracy of the clustering results. PARSUC was implemented on a Hadoop platform by using the MapReduce parallel model. Experiments conducted on massive remote sensing imageries with different sizes showed that PARSUC (1) provided much better accuracy than conventional remote sensing clustering algorithms in handling larger image data; (2) achieved notable scalability with increased computing nodes added; and (3) spent much less time than the existing parallel clustering algorithm in handling RSBD.

Highlights

  • Geospatial data are one of the most significant types of big data, and the rapid growth of such data has imposed enormous challenges to current methodologies, applications, and infrastructures [1,2].With the continuous improvement of earth observation satellite sensors and computer techniques, satellite remote sensing data has exploded in recent years, and a new research field called remote sensing big data (RSBD) has drawn great attention from academia and industry [3,4,5,6]

  • parallel subsampling-based clustering (PARSUC) was implemented on a Hadoop platform by using the MapReduce parallel model

  • We present a centroid filtering algorithm (CFA) to filter these “bad” centroids from the ensemble to reach a global consensus of the position of the centroids in the target clustering

Read more

Summary

Introduction

Geospatial data are one of the most significant types of big data, and the rapid growth of such data has imposed enormous challenges to current methodologies, applications, and infrastructures [1,2].With the continuous improvement of earth observation satellite sensors and computer techniques, satellite remote sensing data has exploded in recent years, and a new research field called remote sensing big data (RSBD) has drawn great attention from academia and industry [3,4,5,6]. Geospatial data are one of the most significant types of big data, and the rapid growth of such data has imposed enormous challenges to current methodologies, applications, and infrastructures [1,2]. Mining hidden knowledge from RSBD for different applications, such as natural hazard monitoring, global climate change analysis, and urban planning, imposes significant computational challenges on scientists and researchers [7,8]. Clustering is an important data mining technique widely used in analyzing remote sensing data. Clustering is an effective technique for automatic remote sensing segmentation and classification since it does not require any training datasets in labeling classes for each pixel. Among the existing clustering methodologies, K-means [10], Sensors 2019, 19, 3438; doi:10.3390/s19153438 www.mdpi.com/journal/sensors

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.