TabbyLD: A Tool for Semantic Interpretation of Spreadsheets Data

Nikita O Dorodnykh,Aleksandr Yu Yurin

doi:10.1007/978-3-030-68527-0_20

Abstract

Spreadsheets are one of the most convenient ways to structure and represent statistical and other data. In this connection, automatic processing and semantic interpretation of spreadsheets data have become an active area of scientific research, especially in the context of integrating this data into the Semantic Web. In this paper, we propose a TabbyLD tool for semantic interpretation of data extracted from spreadsheets. Main features of our software connected with: (1) using original metrics for defining semantic similarity between cell values and entities of a global knowledge graph: string similarity, NER label similarity, heading similarity, semantic similarity, context similarity; (2) using a unified canonicalized form for representation of arbitrary spreadsheets; (3) integration TabbyLD with the TabbyDOC project’s tools in the context of the overall pipeline. TabbyLD architecture, main functions, a method for annotating spreadsheets including original similarity metrics, the illustrative example, and preliminary experimental evaluation are presented. In our evaluation, we used the T2Dv2 Gold Standard dataset. Experiments have shown the applicability of TabbyLD for semantic interpretation of spreadsheets data. We also identified some issues in this process.

Full Text