Clustering Techniques in Data Mining: A Comparative Analysis

E. Kesavulu Reddy

doi:10.9734/bpi/mono/978-93-5547-265-6/ch9

Abstract

Data mining is the process of extracting useful information from massive amounts of data. It is the process of extracting useful information from large amounts of data stored in databases, data warehouses, or other information repositories. Data mining tasks are classified into two types: predictive and descriptive.  The Clustering technique has been used to group data elements without advance knowledge of the group description. The clustering technique belongs to an unsupervised learning and it is used to discover a new set of categories.  Clustering algorithms are classified into hierarchical, partitioning, grid, density-based, model-based, and constraint-based algorithms. The connectivity-based clustering is hierarchical clustering. Partitioning is a center-based clustering method in which the value of k-mean is set. Clusters based on density are defined as areas with a higher density than the rest of the data set. Grid-based clustering has the shortest processing time, which is typically determined by the size of the grid rather than the data. This paper compares the performance of three clustering algorithms: hierarchical clustering, density-based clustering, and K Means clustering.

Full Text