Extracting data models from background knowledge graphs

Daniela Oliveira,Mathieu D’Aquin

doi:10.1016/j.knosys.2021.107818

Abstract

Knowledge Graphs have emerged as a core technology to aggregate and publish knowledge on the Web. However, integrating knowledge from different sources, not specifically designed to be interoperable, is not a trivial task. Finding the right ontologies to model a dataset is a challenge since several valid data models exist and there is no clear agreement between them. In this paper, we propose to facilitate the selection of a data model with the RICDaM (Recommending Interoperable and Consistent Data Models) framework. RICDaM generates and ranks candidates that match entity types and properties in an input dataset. These candidates are obtained by aggregating freely available domain RDF datasets in a knowledge graph and then enriching the relationships between the graph’s entities. The entity type and object property candidates are obtained by exploiting the instances and structure of this knowledge graph to compute a score that considers both the accuracy and interoperability of the candidates. Datatype properties are predicted with a random forest model, trained on the knowledge graph properties and their values, so to make predictions on candidate properties and rank them according to different measures. We present experiments using multiple datasets from the library domain as a use case and show that our methodology can produce meaningful candidate data models, adaptable to specific scenarios and needs.

Full Text