MMDBC: Density-Based Clustering Algorithm for Mixed Attributes and Multi-dimension Data

Haizhou Du,Shuqing Zeng,Wei Fang,Haining Huang

doi:10.1109/bigcomp.2018.00093

Abstract

A large number of mixed attributes and multidimensional data that contain categorical attributes and numerical attributes have become ubiquitous in the real world. However, the most existing clustering methods are only applicable to either numerical or categorical data, but not both. In this paper we present the approach which extends the density-based clustering algorithm to categorical domains and domains with mixed numeric and categorical attributes data. Here, datasets are classified three types according to the proportion of attributes, they will be divided into categorical priority, numerical priority, and equivalent priority. It builds mathematical model individually according to different situations. Our approach produces the ultimate result exactly after executing all range queries on numeric data and merging the weight of categorical attributes. Eventually, we use real-dataset, which contains large complex data, to demonstrate the clustering performance. Moreover, the well known KDD CUP-99 and Adult datasets are used for comparison. The results also show that our approach has the characteristics of fast, efficient and robust.

Full Text