A Strategy of Parallel Seed-Based Image Segmentation Algorithms for Handling Massive Image Tiles over the Spark Platform

Fang Chen,Ning Wang,Yuchu Qin,Lei Wang,Bo Yu

doi:10.3390/rs13101969

Abstract

The volume of remote sensing images continues to grow as image sources become more diversified and with increasing spatial and spectral resolution. The handling of such large-volume datasets, which exceed available CPU memory, in a timely and efficient manner is becoming a challenge for single machines. The distributed cluster provides an effective solution with strong calculation power. There has been an increasing number of big data technologies that have been adopted to deal with large images using mature parallel technology. However, since most commercial big data platforms are not specifically developed for the remote sensing field, two main issues exist in processing large images with big data platforms using a distributed cluster. On the one hand, the quantities and categories of official algorithms used to process remote sensing images in big data platforms are limited compared to large amounts of sequential algorithms. On the other hand, the sequential algorithms employed directly to process large images in parallel over a distributed cluster may lead to incomplete objects in the tile edges and the generation of large communication volumes at the shuffle stage. It is, therefore, necessary to explore the distributed strategy and adapt the sequential algorithms over the distributed cluster. In this research, we employed two seed-based image segmentation algorithms to construct a distributed strategy based on the Spark platform. The proposed strategy focuses on modifying the incomplete objects by processing border areas and reducing the communication volume to a reasonable size by limiting the auxiliary bands and the buffer size to a small range during the shuffle stage. We calculated the F-measure and execution time to evaluate the accuracy and execution efficiency. The statistical data reveal that both segmentation algorithms maintained high accuracy, as achieved in the reference image segmented in the sequential way. Moreover, generally the strategy took less execution time compared to significantly larger auxiliary bands and buffer sizes. The proposed strategy can modify incomplete objects, with execution time being twice as fast as the strategies that do not employ communication volume reduction in the distributed cluster.

Highlights

Introduction conditions of the Creative CommonsThe volume of remote sensing images has been increasing exponentially over the last two decades
Objects in the images are outlined with black border lines, the modified border area objects are marked with red border lines, and the seed points are marked in yellow
For the segmentation results generated by the region growing method (Figure 11), the object sizes become increasingly larger as the seed point density decreases from level 5 to level 1

Summary

Introduction

Introduction conditions of the Creative CommonsThe volume of remote sensing images has been increasing exponentially over the last two decades. 2021, 13, 1969 detailed observation of the earth’s surface, with higher temporal and spatial resolution [1,2]. Such progress is due mainly to advances in sensor technologies and reductions in the costs to produce and launch satellites [3,4]. The processing of tremendously large images is becoming a challenge for end users [5], as the high spatial resolution images increase the richness and the complexity of the information [6]. There will be an increased demand for highly efficient image processing technologies capable of handling very large remote sensing images [9,10]

Objectives

Methods

Results

Discussion

Conclusion