Abstract

Mixed data is a typical heterogeneous structured feature of the complex system elements, which contains both categorical and continuous attributes. To effectively analyze the similarity of mixed data, taking into consideration the heterogeneous coupling relationship between the mixed attributes, this work proposes a heterogeneous coupling relationship-based similarity measure for mixed data (HMS). First, through automatic discretization based on K-means, continuous attributes are converted into categorical attributes and the categorical data view is extracted, for which the HGS method is used to measure the similarity. Besides, through similarity representation, the categorical attributes are converted into continuous attributes, then the continuous data view is constructed and its similarity measure is performed by Euclidean distance. Furthermore, the harmonic mean of the similarity between both views is calculated to obtain the integrated similarity. Finally, the effectiveness and feasibility of the HMS method are verified in clustering experiments. Compared with other common-used mixed data similarity measures, the HMS method can more fully capture the categorical and continuous views of mixed-type attributes of the complex system elements, as well as the complex heterogeneous relationships existing between views and inside the view, which as a result greatly improves the clustering performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call