Abstract

We provide a distributed method to partition a large set of data in clusters, characterized by small in-group and large out-group distances. We assume a wireless sensors network in which each sensor is given a large set of data and the objective is to provide a way to group the sensors in homogeneous clusters by information type. In previous literature, the desired number of clusters must be specified a priori by the user. In our approach, the clusters are constrained to have centroids with a distance at least ε between them and the number of desired clusters is not specified. Although traditional algorithms fail to solve the problem with this constraint, it can help obtain a better clustering. In this paper, a solution based on the Hegselmann-Krause opinion dynamics model is proposed to find an admissible, although suboptimal, solution. The Hegselmann-Krause model is a centralized algorithm; here we provide a distributed implementation, based on a combination of distributed consensus algorithms. A comparison with k-means algorithm concludes the paper.

Highlights

  • The problem of grouping large amounts of data into a small number of subsets with some common features among the elements has attracted the work of several researchers in different fields, ranging from statistics to imagine analysis and bioinformatics [1,2,3].Data clustering techniques are developed to partition an initial set of observation data into collections with small ingroup distances and big out-group distances.Among the existing techniques, one of the most used is the k-means algorithm or its successive extensions

  • Distance-constrained data clustering approaches have been devised in the literature: in [7, 8] the considered constraints are the so-called must-links and cannot-link; in [9] the feasibility of a constrained problem involving the so-called δ-constraints and the ε-constraints is given

  • The outline of the paper is as follows: after some preliminaries, in Section 2, we provide a formulation of the problem at hand and in Section 3 we analyze the formulation of the standard data clustering problem and the k-means algorithm; in Section 4 the HK opinion dynamics model is reviewed, while in Section 5 the distributed consensus algorithms are examined; Section 6 is devoted to outline the proposed approach to solve the distance-constrained clustering problem, while Section 7 addresses the distributed implementation of the HK opinion dynamics model; Section 8 contains some numeric examples to show the potentialities of this method, while Section 9 contains some conclusive remarks

Read more

Summary

Introduction

The problem of grouping large amounts of data into a small number of subsets with some common features among the elements (often referred to as the data clustering problem) has attracted the work of several researchers in different fields, ranging from statistics to imagine analysis and bioinformatics [1,2,3]. International Journal of Distributed Sensor Networks cluster centroids, while this class of constraints might help finding a choice for the number k when such value is a priori unknown This problem has a particular relevance in a distributed setting, where a network of sensors has to classify information provided by several sensors without a central authority but using only local data exchange among neighbors in the network. Rather than striving for a complete agreement among the agents, we exploit the peculiarity of HK model to generate several clusters, with the aim to map a large set of measurement data into few values (i.e, the opinion clusters) In this view, as it will be discussed in the following, HK models can be seen as a powerful methodology to determine the number of clusters while respecting the constraints on the distance among cluster centroids. A graph G is connected if for any Vi, Vj ∈ V there is a path whose endpoints are in Vi and Vj

Problem Statement
Data Clustering
Hegselmann-Krause Opinion Dynamics Model
Consensus Algorithms
Data Clustering with Distance Constraints via Opinion Dynamics and k-Means
Distributed HK Opinion Dynamics
Numeric Examples
Conclusions and Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.