Abstract

Over the years data clustering algorithms have been used for image segmentation. Due to the presence of uncertainty in real life datasets, several uncertainty based data clustering algorithms have been developed. The c-means clustering algorithms form one such family of algorithms. Starting with the fuzzy c-means (FCM) a subfamily of this family comprises of rough c-means (RCM), intuitionistic fuzzy c-means (IFCM) and their hybrids like rough fuzzy c-means (RFCM) and rough intuitionistic fuzzy c-means (RIFCM). In the basic subfamily of this family of algorithms, the Euclidean distance was being used to measure the similarity of data. However, the sub family of algorithms obtained replacing the Euclidean distance by kernel based similarities produced better results. Especially, these algorithms were useful in handling viably cluster data points which are linearly inseparable in original input space. During this period it was inferred by Krishnapuram and Keller that the membership constraints in some rudimentary uncertainty based clustering techniques like fuzzy c-means imparts them a probabilistic nature, hence they suggested its possibilistic version. In fact all the other member algorithms from basic subfamily have been extended to incorporate this new notion. Currently, the use of image data is growing vigorously and constantly, accounting to huge figures leading to big data. Moreover, since image segmentation happens to be one of the most time consuming processes, industries are in the need of algorithms which can solve this problem at a rapid pace and with high accuracy. In this paper, we propose to combine the notions of kernel and possibilistic approach together in a distributed environment provided by Apache™ Hadoop. We integrate this combined notion with map-reduce paradigm of Hadoop and put forth three novel algorithms; Hadoop based possibilistic kernelized rough c-means (HPKRCM), Hadoop based possibilistic kernelized rough fuzzy c-means (HPKRFCM) and Hadoop based possibilistic kernelized rough intuitionistic fuzzy c-means (HPKRIFCM) and study their efficiency in image segmentation. We compare their running times and analyze their efficiencies with the corresponding algorithms from the other three sub families on four different types of images, three different kernels and six different efficiency measures; the Davis Bouldin index (DB), Dunn index (D), alpha index (α), rho index (ρ), alpha star index (α*) and gamma index (γ). Our analysis shows that the hyper-tangent kernel with Hadoop based possibilistic kernelized rough intuitionistic fuzzy c-means is the best one for image segmentation among all these clustering algorithms. Also, the times taken to render segmented images by the proposed algorithms are drastically low in comparison to the other algorithms. The implementations of the algorithms have been carried out in Java and for the proposed algorithms we have used Hadoop framework installed on CentOS. For statistical plotting we have used matplotlib (python library).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call