Frequent Itemset Mining and Multi-Layer Network-Based Analysis of RDF Databases

Gergely Honti,János Abonyi

doi:10.3390/math9040450

Abstract

Triplestores or resource description framework (RDF) stores are purpose-built databases used to organise, store and share data with context. Knowledge extraction from a large amount of interconnected data requires effective tools and methods to address the complexity and the underlying structure of semantic information. We propose a method that generates an interpretable multilayered network from an RDF database. The method utilises frequent itemset mining (FIM) of the subjects, predicates and the objects of the RDF data, and automatically extracts informative subsets of the database for the analysis. The results are used to form layers in an analysable multidimensional network. The methodology enables a consistent, transparent, multi-aspect-oriented knowledge extraction from the linked dataset. To demonstrate the usability and effectiveness of the methodology, we analyse how the science of sustainability and climate change are structured using the Microsoft Academic Knowledge Graph. In the case study, the FIM forms networks of disciplines to reveal the significant interdisciplinary science communities in sustainability and climate change. The constructed multilayer network then enables an analysis of the significant disciplines and interdisciplinary scientific areas. To demonstrate the proposed knowledge extraction process, we search for interdisciplinary science communities and then measure and rank their multidisciplinary effects. The analysis identifies discipline similarities, pinpointing the similarity between atmospheric science and meteorology as well as between geomorphology and oceanography. The results confirm that frequent itemset mining provides an informative sampled subsets of RDF databases which can be simultaneously analysed as layers of a multilayer network.

Highlights

Linked data (LD) represent an essential tool used to organise, store and share data with context [1]
The programs of the following case study are available at the github (https://github. com/abonyilab/aprioriSPARQL) and the raw dataset is available on the Microsoft Academic Knowledge Graph homepage (http://ma-graph. org/rdf-dumps/) as well as the SPARQL endpoint (http: //ma-graph.org/sparql)
Our aim was to study the realms of sustainability and climate change based on the Microsoft Academic Knowledge Graph (MAKG) dataset, and on the other hand to showcase the importance of the proper focus to not get lost at scale, the applied frequent itemset mining pinpoints and keeps understandable the important areas of the data

Summary

Introduction

Linked data (LD) represent an essential tool used to organise, store and share data with context [1]. Datasets that are published as LD form the Semantic Web. The part of the Sematic Web which is freely accessible is called the linked open data cloud (LODC). LOD offers large quantities of freely available, interconnected, statistical (linked open statistical data (LOSD)) [10], governmental [11], scientific [12,13] and other annotated data [14]. The collection of such databases forms the Linked Open Data Cloud (LODC) [15], which consists of 2973 datasets with 149.5 billion triplets

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Mathematics	Publication Date: Feb 23, 2021
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Frequent Itemset Mining and Multi-Layer Network-Based Analysis of RDF Databases

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematics

Lead the way for us

Similar Papers

Proposal of support tools for analyzing RDF database using TETDM
Yasufumi Takama ... Koichi Tashiro
-
Yasufumi Takama, et. al.Yasufumi Takama ... Koichi Tashiro
01 Dec 2014
01 Dec 2014

Use of Metadata for Access Control and Version Management in RDF Database
Kazuhiro Kuwabara ... Shotaro Yasunaga
-
Kazuhiro Kuwabara, et. al.Kazuhiro Kuwabara ... Shotaro Yasunaga
01 Jan 2010
01 Jan 2010

A Java-Based Interface for Creating and Mining RDF Database
Siddharth S Samsi ... Brian Guilfoos
-
Siddharth S Samsi, et. al.Siddharth S Samsi ... Brian Guilfoos
01 Jun 2009
01 Jun 2009

Processing SPARQL queries with regular expressions in RDF databases
Jinsoo Lee ... Hwanjo Yu
BMC Bioinformatics | VOL. 12
Jinsoo Lee, et. al.Jinsoo Lee ... Hwanjo Yu
29 Mar 2011
BMC Bioinformatics | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Frequent Itemset Mining and Multi-Layer Network-Based Analysis of RDF Databases

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematics