Abstract

In this paper, we present a novel data clustering framework for big sensory data produced by IoT applications. Based on a network representation of the relations among multi-dimensional data, data clustering is mapped to node clustering over the produced data graphs. To address the potential very large scale of such datasets/graphs that test the limits of state-of-the-art approaches, we map the problem of data clustering to a community detection one over the corresponding data graphs. Specifically, we propose a novel computational approach for enhancing the traditional Girvan–Newman (GN) community detection algorithm via hyperbolic network embedding. The data dependency graph is embedded in the hyperbolic space via Rigel embedding, allowing more efficient computation of edge-betweenness centrality needed in the GN algorithm. This allows for more efficient clustering of the nodes of the data graph in terms of modularity, without sacrificing considerable accuracy. In order to study the operation of our approach with respect to enhancing GN community detection, we employ various representative types of artificial complex networks, such as scale-free, small-world and random geometric topologies, and frequently-employed benchmark datasets for demonstrating its efficacy in terms of data clustering via community detection. Furthermore, we provide a proof-of-concept evaluation by applying the proposed framework over multi-dimensional datasets obtained from an operational smart-city/building IoT infrastructure provided by the Federated Interoperable Semantic IoT/cloud Testbeds and Applications (FIESTA-IoT) testbed federation. It is shown that the proposed framework can be indeed used for community detection/data clustering and exploited in various other IoT applications, such as performing more energy-efficient smart-city/building sensing.

Highlights

  • Sensor networks in future smart-cities/buildings will be larger and more heterogeneous, collecting information from radically different services and forming more complex topologies

  • We propose a modification of the Girvan–Newman (GN) community detection algorithm [8] via hyperbolic network embedding, making it suitable for big sensory data clustering

  • We studied the performance of our approach for various types of artificial benchmark datasets and networks, including topologies from real social networks, showing that the hyperbolic GN method can be used for community detection and data clustering in various scenarios

Read more

Summary

Introduction

Sensor networks in future smart-cities/buildings will be larger and more heterogeneous, collecting information from radically different services and forming more complex topologies. Both large network topologies and large datasets are required to be analyzed efficiently Big sensory data, such as those obtained in smart cities/building networks [4], vary in volume, type and time scale, frequently forming multi-dimensional datasets, where co-located measurements of diverse types constitute complex data of multiple dimensions. As the volume of the generated sensor measurements increases at unprecedented scales, sometimes in the order of petabytes [6], data clustering techniques will require fundamental enhancements to ensure their sustainability Various directions for this have been recently pinpointed. We propose a modification of the Girvan–Newman (GN) community detection algorithm [8] via hyperbolic network embedding, making it suitable for big sensory data clustering This enhancement of GN is based on a new approximation approach for the computation of EBC, where node distances are computed in a graph embedded in hyperbolic space.

Background and Related Work
Contribution
Community Detection Enhancement via Hyperbolic Network Embedding
Hyperbolic Edge-Betweenness Centrality
Big Sensor Data Clustering
Evaluation
Evaluation and Performance Assessment of HEBC Computation
Known Communities
Unknown Communities
Real Evaluation on FIESTA-IoT Datasets
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.