Abstract

Triplestores or resource description framework (RDF) stores are purpose-built databases used to organise, store and share data with context. Knowledge extraction from a large amount of interconnected data requires effective tools and methods to address the complexity and the underlying structure of semantic information. We propose a method that generates an interpretable multilayered network from an RDF database. The method utilises frequent itemset mining (FIM) of the subjects, predicates and the objects of the RDF data, and automatically extracts informative subsets of the database for the analysis. The results are used to form layers in an analysable multidimensional network. The methodology enables a consistent, transparent, multi-aspect-oriented knowledge extraction from the linked dataset. To demonstrate the usability and effectiveness of the methodology, we analyse how the science of sustainability and climate change are structured using the Microsoft Academic Knowledge Graph. In the case study, the FIM forms networks of disciplines to reveal the significant interdisciplinary science communities in sustainability and climate change. The constructed multilayer network then enables an analysis of the significant disciplines and interdisciplinary scientific areas. To demonstrate the proposed knowledge extraction process, we search for interdisciplinary science communities and then measure and rank their multidisciplinary effects. The analysis identifies discipline similarities, pinpointing the similarity between atmospheric science and meteorology as well as between geomorphology and oceanography. The results confirm that frequent itemset mining provides an informative sampled subsets of RDF databases which can be simultaneously analysed as layers of a multilayer network.

Highlights

  • Linked data (LD) represent an essential tool used to organise, store and share data with context [1]

  • The programs of the following case study are available at the github (https://github. com/abonyilab/aprioriSPARQL) and the raw dataset is available on the Microsoft Academic Knowledge Graph homepage (http://ma-graph. org/rdf-dumps/) as well as the SPARQL endpoint (http: //ma-graph.org/sparql)

  • Our aim was to study the realms of sustainability and climate change based on the Microsoft Academic Knowledge Graph (MAKG) dataset, and on the other hand to showcase the importance of the proper focus to not get lost at scale, the applied frequent itemset mining pinpoints and keeps understandable the important areas of the data

Read more

Summary

Introduction

Linked data (LD) represent an essential tool used to organise, store and share data with context [1]. Datasets that are published as LD form the Semantic Web. The part of the Sematic Web which is freely accessible is called the linked open data cloud (LODC). LOD offers large quantities of freely available, interconnected, statistical (linked open statistical data (LOSD)) [10], governmental [11], scientific [12,13] and other annotated data [14]. The collection of such databases forms the Linked Open Data Cloud (LODC) [15], which consists of 2973 datasets with 149.5 billion triplets

Objectives
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.