Abstract

A large number of mixed attributes and multidimensional data that contain categorical attributes and numerical attributes have become ubiquitous in the real world. However, the most existing clustering methods are only applicable to either numerical or categorical data, but not both. In this paper we present the approach which extends the density-based clustering algorithm to categorical domains and domains with mixed numeric and categorical attributes data. Here, datasets are classified three types according to the proportion of attributes, they will be divided into categorical priority, numerical priority, and equivalent priority. It builds mathematical model individually according to different situations. Our approach produces the ultimate result exactly after executing all range queries on numeric data and merging the weight of categorical attributes. Eventually, we use real-dataset, which contains large complex data, to demonstrate the clustering performance. Moreover, the well known KDD CUP-99 and Adult datasets are used for comparison. The results also show that our approach has the characteristics of fast, efficient and robust.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.