11 - Advanced Cluster Analysis

Jiawei Han,Micheline Kamber,Jian Pei

doi:10.1016/b978-0-12-381479-1.00011-3

Abstract

This chapter discusses the advanced topics of cluster analysis. In conventional cluster analysis, an object is assigned to one cluster exclusively. However, in some applications, there is a need to assign an object to one or more clusters in a fuzzy or probabilistic way. Fuzzy clustering and probabilistic model-based clustering allow an object to belong to one or more clusters. A partition matrix records the membership degree of objects belonging to clusters. There are two major categories of clustering methods for high-dimensional data: subspace clustering methods and dimensionality reduction methods. Subspace clustering methods search for clusters in subspaces of the original space. Dimensionality reduction methods create a new space of lower dimensionality and search for clusters there. Probabilistic model-based clustering has a general framework and is a method for deriving clusters where each object is assigned a probability of belonging to a cluster. Probabilistic model-based clustering is widely used in many data mining applications such as text mining. Clustering high-dimensional data is used when the dimensionality is high and conventional distance measures are dominated by noise. Fundamental methods for cluster analysis on high-dimensional data are introduced. Graph and network data are increasingly popular in applications such as online social networks, the World Wide Web, and digital libraries. The key issues in clustering graph and network data, including similarity measurement and clustering methods are studied. In some applications various constraints may exist. These constraints may rise from background knowledge or spatial distribution of the objects. The process of how to conduct cluster analysis with different kinds of constraints is discussed.

Full Text