Sort by
Dynamic Diverse Summarisation in Heterogeneous Graph Streams: a Comparison between Thesaurus/Ontology-based and Embeddings-based Approaches

Nowadays, there is a lot of attention drawn in smart environments, like Smart Cities and Internet of Things. These environments generate data streams that could be represented as graphs, which can be analysed in real-time to satisfy user or application needs. The challenges involved in these environments, ranging from the dynamism, heterogeneity, continuity, and high-volume of these real-world graph streams create new requirements for graph processing algorithms. We propose a dynamic graph stream summarisation system with the use of embeddings that provides expressive graphs while ensuring high usability and limited resource usage. In this paper, we examine the performance comparison between our embeddings-based approach and an existing thesaurus/ontology-based approach (FACES) that we adapted in a dynamic environment with the use of windows and data fusion. Both approaches use conceptual clustering and top-k scoring that can result in expressive, dynamic graph summaries with limited resources. Evaluations show that sending top-k fused diverse summaries, results in 34% to 92% reduction of forwarded messages and redundancy-awareness with an F-score ranging from 0.80 to 0.95 depending on the k compared to sending all the available information without top-k scoring. Also, the summaries' quality follows the agreement of ideal summaries determined by human judges. The summarisation approaches come with the expense of reduced system performance. The thesaurus/ontology-based approach proved 6 times more latency-heavy and 3 times more memory-heavy compared to the most expensive embeddings-based approach while having lower throughput but provided slightly better quality summaries.

Open Access
Relevant
A Framework for Exploration and Visualization of SPARQL Endpoint Information

Widely accepted standards, such as the Resource Description Framework, have provided unified ways for data provision aiming to facilitate the exchange of information between machines. This information became of interest to a wider audience due to its volume and variety but the available formats are posing significant challenges to users with limited knowledge of the Semantic Web. The SPARQL query language alleviates this barrier by facilitating the exploration of this information and many data providers have created dedicated SPARQL endpoints for their data. Many efforts have been dedicated to the development of systems that will provide access and support the exploration of these endpoints in a semantically correct and user friendly way. The main challenge of such approaches is the diversity of the information contained in the endpoints, which renders holistic or schema specific solutions obsolete. We present here an integrated platform that supports the users to the querying, exploration and visualization of information contained in SPARQL endpoints. The platform handles each query result independently based only on its characteristics, offering an endpoint and data schema agnostic solution. This is achieved through a Decision Support System, developed based on a knowledge base containing information experimentally collected from many endpoints, that allows us to provide case-specific visualization strategies for SPARQL query results based exclusively on features extracted from the result.

Open Access
Relevant