Clustering of mixed-type data considering concept hierarchies: problem specification and algorithm

Sahar Behzadi,Claudia Plant,Christian Böhm,Nikola S Müller

doi:10.1007/s41060-020-00216-2

Sahar Behzadi, Claudia Plant + Show 2 more

Open Access

PDF Available

https://doi.org/10.1007/s41060-020-00216-2

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Most clustering algorithms have been designed only for pure numerical or pure categorical data sets, while nowadays many applications generate mixed data. It raises the question how to integrate various types of attributes so that one could efficiently group objects without loss of information. It is already well understood that a simple conversion of categorical attributes into a numerical domain is not sufficient since relationships between values such as a certain order are artificially introduced. Leveraging the natural conceptual hierarchy among categorical information, concept trees summarize the categorical attributes. In this paper, we introduce the algorithm ClicoT (CLustering mixed-type data Including COncept Trees) as reported by Behzadi et al. (Advances in Knowledge Discovery and Data Mining, Springer, Cham, 2019) which is based on the minimum description length principle. Profiting of the conceptual hierarchies, ClicoT integrates categorical and numerical attributes by means of a MDL-based objective function. The result of ClicoT is well interpretable since concept trees provide insights into categorical data. Extensive experiments on synthetic and real data sets illustrate that ClicoT is noise-robust and yields well-interpretable results in a short runtime. Moreover, we investigate the impact of concept hierarchies as well as various data characteristics in this paper.

Highlights

Declarations– Availability of data and material We used miles per gallon (MPG), Automobile and Adult data sets from the UCI Public Data Repository [7] as well as Airport data set from the public project Open Flights (http://openflights.org/data.html)
Clustering mixed data is a non-trivial task and typically is not achieved by well-known clustering algorithms designed for a specific type
Informationtheoretic approaches have been proposed to avoid the difficulty of estimating input parameters. These algorithms regard the clustering as a data compression problem by hiring the minimum description length (MDL)

Summary

Declarations

– Availability of data and material We used MPG, Automobile and Adult data sets from the UCI Public Data Repository [7] as well as Airport data set from the public project Open Flights (http://openflights.org/data.html). – Code availability Our algorithm is implemented in Java and the source code as well as the data sets are publicly available here: https://tinyurl.com/ucp8289

Introduction

Clustering mixed data types

Concept hierarchy

Cluster-specific elements

Integrative objective function

Objective

Algorithm

How to specify cluster-specific elements?

Probability adjustment

ClicoT algorithm

12: Update each attribute of Ci

Related work

Evaluation

Mixed-type clustering of synthetic data

Experiments on real-world data

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Data Science and Analytics	Publication Date: Apr 25, 2020
Citations: 9	License type: open-access

R Discovery Prime

Clustering of mixed-type data considering concept hierarchies: problem specification and algorithm

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: International Journal of Data Science and Analytics

Lead the way for us

Similar Papers

Clustering of Mixed-Type Data Considering Concept Hierarchies
Sahar Behzadi ... Claudia Plant
-
Sahar Behzadi, et. al.Sahar Behzadi ... Claudia Plant
01 Jan 2019
01 Jan 2019

An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood
Shifei Ding ... Yu Xue
Knowledge-Based Systems | VOL. 133
Shifei Ding, et. al.Shifei Ding ... Yu Xue
21 Jul 2017
Knowledge-Based Systems | VOL. 133

A Unified Metric for Categorical and Numerical Attributes in Data Clustering
Yiu-Ming Cheung ... Hong Jia
-
Yiu-Ming Cheung, et. al.Yiu-Ming Cheung ... Hong Jia
01 Jan 2013
01 Jan 2013

Parameter Free Mixed-Type Density-Based Clustering
Sahar Behzadi ... Claudia Plant
-
Sahar Behzadi, et. al.Sahar Behzadi ... Claudia Plant
01 Jan 2018
01 Jan 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Clustering of mixed-type data considering concept hierarchies: problem specification and algorithm

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: International Journal of Data Science and Analytics