Abstract

Clustering, also known as cluster analysis, is a learning problem that occurs without the intervention of a human. This technique is frequently used very efficiently in data analysis to observe and identify interesting, useful, or desirable patterns in data. The clustering technique operates by dividing the data involved into similar objects based on their identified properties. This process results in the formation of groups, and each formed group is referred to as a cluster. A single said cluster consists of objects from the data that share similarities with other objects found in the same cluster and differ from objects identified from the data that now exist in other clusters. Clustering is an important process in many aspects of data analysis because it determines and presents the intrinsic grouping of objects in the data based on their attributes in a batch of unlabeled raw data. This method of cluster analysis lacks a textbook or, to put it another way, good criteria. This is due to the fact that this process is unique and customizable for each user who requires it for a variety of reasons. There is no single best clustering algorithm because it is so dependent on the user's scenario and needs. The purpose of this paper is to compare and contrast two different clustering algorithms. The algorithms under consideration are the k- mean and the mean shift. These algorithms are compared based on the following criteria: time complexity, training, prediction performance, and clustering algorithm accuracy.

Highlights

  • The quickened progress of technology [1, 2] in recent time is encouraging a significant expansion in the measure of created and stored information in fields like education, engineering, training, medication, and trade, among others [3]

  • After carrying out the research process described in the paper, in its entirety, I have concluded that, both data analysis techniques have their perks and advantages, in the majority of the user-cases and situations, the technique ”K-means” is better compared to mean shift due to some reasons

  • The training time that is needed to carry out K-means is significantly less than it is needed in the mean shift, and especially where large sets of data and clusters are concerned, this advantage of K-means is absolutely integral

Read more

Summary

Introduction

The quickened progress of technology [1, 2] in recent time is encouraging a significant expansion in the measure of created and stored information in fields like education, engineering, training, medication, and trade, among others [3]. Clustering, known as cluster analysis, is a type of learning problem that occurs without the intervention of a human. This technique has been widely used in data analysis, and it is useful for observing and identifying interesting, useful, or desired patterns in the data.[5]. The clustering technique works by dividing the data involved into similar objects based on the characteristics that it identifies. This process results in the formation of groups, and each formed group is referred to as a cluster. One of the clustering algorithms more widely used to date is Kmeans, due to its easiness for interpreting its results and implementation

Objectives
Methods
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call