Abstract

Graphs are commonly used to express the communication of various data. Faced with uncertain data, we have probabilistic graphs. As a fundamental problem of such graphs, clustering has many applications in analyzing uncertain data. In this paper, we propose a novel method based on ensemble clustering for large probabilistic graphs. To generate ensemble clusters, we develop a set of probable possible worlds of the initial probabilistic graph. Then, we present a probabilistic co-association matrix as a consensus function to integrate base clustering results. It relies on co-occurrences of node pairs based on the probability of the corresponding common cluster graphs. Also, we apply two improvements in the steps before and after of ensembles generation. In the before step, we append neighborhood information based on node features to the initial graph to achieve a more accurate estimation of the probability between the nodes. In the after step, we use supervised metric learning-based Mahalanobis distance to automatically learn a metric from ensemble clusters. It aims to gain crucial features of the base clustering results. We evaluate our work using five real-world datasets and three clustering evaluation metrics, namely the Dunn index, Davies–Bouldin index, and Silhouette coefficient. The results show the impressive performance of clustering large probabilistic graphs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.