PERFORMANCE EVALUATION OF SELECTED DISTANCE-BASED AND DISTRIBUTION-BASED CLUSTERING ALGORITHMS

Department Of Computer Science Faculty Of Communication & Information Sciences University Of Ilorin, Ilorin, Nigeria ,Ajiboye A R,Olufadi H I

doi:10.15282/ijsecs.4.2.2018.3.0047

Department Of Computer Science Faculty Of Communication & Information Sciences University Of Ilorin, Ilorin, Nigeria , Ajiboye A R + Show 1 more

Open Access

https://doi.org/10.15282/ijsecs.4.2.2018.3.0047

Copy DOI

Abstract

Clustering is an automated search for hidden patterns in a datasets to unveil group of related observations. The technique is one of the viable means by which the patterns or internal structure of the data within the same collection can be revealed. Choosing the right algorithm to achieve clusters of good quality is usually a challenge, especially when the number of clusters cannot be pre-determined. This study focuses on evaluating a number of selected clustering algorithms in finding quality clusters in the data sets. To achieve the central objective of this study, prominent technique in both the distance-based and the distribution-based clustering algorithm, specifically k-means and EM clustering algorithm respectively are implemented in this study. The data sets on which the algorithms were implemented comprised of 1,309 records of passenger information that boarded a ship retrieved from rapidMiner open repository. Experiments were conducted and clusters were formed based on the number of chosen partitions, k. The qualities of the clusters formed are measured using the concept of external criterion, Normalized Mutual Information (NMI), to validate all the clusters formed. The resulting output of this study shows that, the distance-based algorithm find clusters of higher quality with NMI value of 0.912 out of a maximum achievable value of 1. The experiment further reveals the average execution time it takes each algorithm to form the cluster model. The findings of this study also unveiled some useful insight into the choice of clustering algorithm as regards their support for a particular data type and the ease of execution of each algorithm. Keywords: clustering, data mining, k-means, EM-clustering, un-supervised learning.

Highlights

Clustering analysis is generally referred to as an unsupervised learning approach that seeks to identify or group objects based on their similarity features
The results of evaluating the qualities of the clusters formed with regards to the clustering algorithms implemented in this study are presented
In order to determine the qualities of the clusters formed, the Normalized Mutual Information (NMI) is computed

Summary

Introduction

Clustering analysis is generally referred to as an unsupervised learning approach that seeks to identify or group objects based on their similarity features. Clustering techniques using the K-means (Suh, 2012) and Kmedoids algorithms (Berkhin, 2006), are typical distanced-based approaches. There is an attempt to reproduce the observed realization of data points as a mix of predefined probability distribution functions (McLachlan et al, 2008). The descriptive technique is useful in several areas especially classification purposes. It is an unsupervised learning technique as it group data object without consulting class labels (Han et al, 2012); It automatically unveils the hidden features or the patterns in the dataset

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Software Engineering and Computer Systems	Publication Date: Aug 30, 2018
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

PERFORMANCE EVALUATION OF SELECTED DISTANCE-BASED AND DISTRIBUTION-BASED CLUSTERING ALGORITHMS

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Software Engineering and Computer Systems

Lead the way for us

Similar Papers

A Primer on Machine Learning.
Audrene S. Edwards ... Bruce Kaplan
Transplantation | VOL. 105
Audrene S. Edwards, et. al.Audrene S. Edwards ... Bruce Kaplan
18 Aug 2020
Transplantation | VOL. 105

Semi-Supervised Clustering Ensemble Based on Cluster Consensus Selection
Yanxi Liu ... Ali Hussein Demin Al-Khafaji
Cybernetics and Systems | VOL. ahead-of-print
Yanxi Liu, et. al.Yanxi Liu ... Ali Hussein Demin Al-Khafaji
15 Dec 2022
Cybernetics and Systems | VOL. ahead-of-print

An Empirical Study on Anomaly Detection Using Density-based and Representative-based Clustering Algorithms
Gerard Shu Fuhnwi ... Olumuyiwa James Peter
Journal of the Nigerian Society of Physical Sciences | VOL. -
Gerard Shu Fuhnwi, et. al.Gerard Shu Fuhnwi ... Olumuyiwa James Peter
19 Apr 2023
Journal of the Nigerian Society of Physical Sciences | VOL. -

The Parameter-less Randomized Gravitational Clustering algorithm with online clusters’ structure characterization
Jonatan Gomez ... Elizabeth Leon
Progress in Artificial Intelligence | VOL. 2
Jonatan Gomez, et. al.Jonatan Gomez ... Elizabeth Leon
03 May 2014
Progress in Artificial Intelligence | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PERFORMANCE EVALUATION OF SELECTED DISTANCE-BASED AND DISTRIBUTION-BASED CLUSTERING ALGORITHMS

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Software Engineering and Computer Systems