A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment

R.J.G.B Campello

doi:10.1016/j.patrec.2006.11.010

Abstract

A fuzzy extension of the Rand index [Rand, W.M., 1971. Objective criteria for the evaluation of clustering methods. J. Amer. Statist. Assoc. 846–850] is introduced in this paper. The Rand index is a traditional criterion for assessment and comparison of different results provided by classifiers and clustering algorithms. It is able to measure the quality of different hard partitions of a data set from a classification perspective, including partitions with different numbers of classes or clusters. The original Rand index is extended here by making it able to evaluate a fuzzy partition of a data set – provided by a fuzzy clustering algorithm or a classifier with fuzzy-like outputs – against a reference hard partition that encodes the actual (known) data classes. A theoretical formulation based on formal concepts from the fuzzy set theory is derived and used as a basis for the mathematical interpretation of the Fuzzy Rand Index proposed. The fuzzy counterparts of other (five) related indexes, namely, the Adjusted Rand Index of Hubert and Arabie, the Jaccard coefficient, the Minkowski measure, the Fowlkes–Mallows Index, and the Γ statistics, are also derived from this formulation.

Full Text