Abstract
One problem in clustering (classification) analysis relates to whether or not the original variables should be transformed in some way before they are used by the clustering algorithm. More often than not, the original variables do require some transformation. The purpose of the transformation may be a desire to have more compact clusters in the space of the transformed variables, to take into account the different nature and/or units of the variables involved, to allow for the different or equal ‘importance’ of different variables, to minimize the number of variables used, etc. Among the linear transformations of variables we distinguish two groups - those which change only the scales of the variables (they are often called weighting procedures), and those which also rotate the space of variables (a good example would be the method of principal components (1)). This paper addresses the former group of transformations. One strong reason for using the weighted variables (as opposed to their linear combinations) is that when using them one can interpret the results of the classification in terms of the original (physical) variables. Unfortunately, weighting the variables can result in ‘spoiling’ the compactness of the clusters in the space of the weighted variables if the weighting procedure being used ‘does not care’ about the results of clustering (in other words if the weighting is done prior to and independently of the clustering). A method of weighting the variables which is a part of the classification procedure and thus guarantees an improvement of the cluster clarity is suggested in this paper. The weights of variables and the clusters of objects produced by the algorithm correspond to a local minimum of some classification criterion. Because of this, the resultant weights can be interpreted as a measure of ‘importance’ of the variables for the classification purpose. These weights are compared with such popular weighting procedures as equal variance (6) and Mahalanobis distance (7) methods. Two examples of the performance of the algorithm are presented.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Similar Papers
More From: Pattern Recognition
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.