A combined algorithm for weighting the variables and clustering in the clustering problem

Vladimir J Lumelsky

doi:10.1016/0031-3203(82)90001-2

Abstract

One problem in clustering (classification) analysis relates to whether or not the original variables should be transformed in some way before they are used by the clustering algorithm. More often than not, the original variables do require some transformation. The purpose of the transformation may be a desire to have more compact clusters in the space of the transformed variables, to take into account the different nature and/or units of the variables involved, to allow for the different or equal ‘importance’ of different variables, to minimize the number of variables used, etc. Among the linear transformations of variables we distinguish two groups - those which change only the scales of the variables (they are often called weighting procedures), and those which also rotate the space of variables (a good example would be the method of principal components (1)). This paper addresses the former group of transformations. One strong reason for using the weighted variables (as opposed to their linear combinations) is that when using them one can interpret the results of the classification in terms of the original (physical) variables. Unfortunately, weighting the variables can result in ‘spoiling’ the compactness of the clusters in the space of the weighted variables if the weighting procedure being used ‘does not care’ about the results of clustering (in other words if the weighting is done prior to and independently of the clustering). A method of weighting the variables which is a part of the classification procedure and thus guarantees an improvement of the cluster clarity is suggested in this paper. The weights of variables and the clusters of objects produced by the algorithm correspond to a local minimum of some classification criterion. Because of this, the resultant weights can be interpreted as a measure of ‘importance’ of the variables for the classification purpose. These weights are compared with such popular weighting procedures as equal variance (6) and Mahalanobis distance (7) methods. Two examples of the performance of the algorithm are presented.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A combined algorithm for weighting the variables and clustering in the clustering problem

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition

Lead the way for us

Journal: Pattern Recognition	Publication Date: Jan 1, 1982
Citations: 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A combined algorithm for weighting the variables and clustering in the clustering problem

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition