Abstract
In clustering analysis, data attributes may have different contributions to the detection of various clusters. To solve this problem, the subspace clustering technique has been developed, which aims at grouping the data objects into clusters based on the subsets of attributes rather than the entire data space. However, the most existing subspace clustering methods are only applicable to either numerical or categorical data, but not both. This paper, therefore, studies the soft subspace clustering of data with both of the numerical and categorical attributes (also simply called mixed data for short). Specifically, an attribute-weighted clustering model based on the definition of object-cluster similarity is presented. Accordingly, a unified weighting scheme for the numerical and categorical attributes is proposed, which quantifies the attribute-to-cluster contribution by taking into account both of intercluster difference and intracluster similarity. Moreover, a rival penalized competitive learning mechanism is further introduced into the proposed soft subspace clustering algorithm so that the subspace cluster structure as well as the most appropriate number of clusters can be learned simultaneously in a single learning paradigm. In addition, an initialization-oriented method is also presented, which can effectively improve the stability and accuracy of -means-type clustering methods on numerical, categorical, and mixed data. The experimental results on different benchmark data sets show the efficacy of the proposed approach.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Neural Networks and Learning Systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.