Abstract

Clustering is an important technology for data analysis. Cluster analysis for mixed data remains challenging. This paper proposes a mixed data clustering algorithm with noise-filtered distribution centroid and iterative weight adjustment strategy. The proposed algorithm defines noise-filtered distribution centroid for categorical attributes. We combine both mean and noise-filtered distribution centroid to represent the cluster center with mixed attributes, the noise-filtered distribution centroid records the frequency of occurrences for each possible value of the categorical attributes in a cluster more accurately. Furthermore, because the “noise values” are filtered, the measure to calculate the dissimilarity between data objects and cluster centers could be improved. In addition, the algorithm introduces an iterative weight adjustment strategy with combined intra-cluster and inter-cluster information. The unified weight measurement method is used for refining numeric attributes and categorical attributes. Then attributes with higher intra-cluster homogeneity and inter-clusters heterogeneity are considered as attributes with higher priority. They tend to be assigned with relatively heavier weights during clustering. Experimental results on different datasets from the UCI repository show that the MCFCIW algorithm outperforms the existing partition-based clustering algorithm and clustering algorithm based on data conversion for mixed data on both cluster validity indices and convergence speed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.