10 - Cluster Analysis: Basic Concepts and Methods

Jiawei Han,Jian Pei,Micheline Kamber

doi:10.1016/b978-0-12-381479-1.00010-1

Abstract

This chapter presents the basic concepts and methods of cluster analysis. The requirements of clustering methods for massive amounts of data and various applications are studied. Several basic clustering techniques are discussed organized into the following categories: partitioning methods, hierarchical methods, density-based methods, and grid-based methods). Evaluation process for clustering methods is also discussed. A cluster is a collection of data objects that are similar to one another within the same cluster and are dissimilar to the objects in other clusters. The process of grouping a set of physical or abstract objects into classes of similar objects is called clustering. Clustering is the process of grouping a set of data objects into multiple groups or clusters so that objects within a cluster have high similarity, but are very dissimilar to objects in other clusters. Dissimilarities and similarities are assessed based on the attribute values describing the objects and often involve distance measures. Clustering as a data mining tool has its roots in many application areas such as biology, security, business intelligence, and Web search. Cluster analysis has extensive applications, including business intelligence, image pattern recognition, Web search, biology, and security. Cluster analysis can be used as a standalone data mining tool to gain insight into the data distribution, or as a preprocessing step for other data mining algorithms operating on the detected clusters.

Full Text