A Spark-Based Parallel Fuzzy $c$ -Means Segmentation Algorithm for Agricultural Image Big Data

Bin Liu,Dongjian He,Yin Zhang,Mohsen Guizani,Songrui He

doi:10.1109/access.2019.2907573

Abstract

With the explosive growth of image big data in the agriculture field, image segmentation algorithms are confronted with unprecedented challenges. As one of the most important images segmentation technologies, the fuzzy c-means (FCMs) algorithm has been widely used in the field of agricultural image segmentation as it provides simple computation and high-quality segmentation. However, due to its large amount of computation, the sequential FCM algorithm is too slow to finish the segmentation task within an acceptable time. This paper proposes a parallel FCM segmentation algorithm based on the distributed memory computing platform Apache Spark for agricultural image big data. The input image is first converted from the RGB color space to the lab color space and generates point cloud data. Then, point cloud data are partitioned and stored in different computing nodes, in which the membership degrees of pixel points to different cluster centers are calculated and the cluster centers are updated iteratively in a data-parallel form until the stopping condition is satisfied. Finally, point cloud data are restored after clustering for reconstructing the segmented image. On the Spark platform, the performance of the parallel FCMs algorithm is evaluated and reaches an average speedup of 12.54 on ten computing nodes. The experimental results show that the Spark-based parallel FCMs algorithm can obtain a significant increase in speedup, and the agricultural image testing set delivers a better performance improvement of 128% than the Hadoop-based approach. This paper indicates that the Spark-based parallel FCM algorithm provides faster speed of segmentation for agricultural image big data and has better scale-up and size-up rates.

Full Text