Abstract

We present JedAI, a new open-source toolkit for endto- end Entity Resolution. JedAI is domain-agnostic in the sense that it does not depend on background expert knowledge, applying seamlessly to data of any domain with minimal human intervention. JedAI is also structure-agnostic, as it can process any type of data, ranging from structured (relational) to semi-structured (RDF) and un-structured (free-text) entity descriptions. JedAI consists of two parts: (i) JedAI-core is a library of numerous state-of-the-art methods that can be mixed and matched to form (thousands of) end-to-end workflows, allowing for easily benchmarking their relative performance. (ii) JedAI-gui is a user-friendly desktop application that facilitates the composition of complex workflows via a wizard-like interface. It is suitable for both lay and power users, offering concrete guidelines and automatic configuration, as well as manual configuration options, visual exploration, and detailed statistics for each method's performance. In this paper, we also delve into the new features of JedAI's latest version (2.1), and demonstrate its performance experimentally.

Highlights

  • Entity Resolution (ER) aims to detect di↵erent entity profiles that describe the same real-world objects [4]

  • Users can identify the weak link in an end-to-end workflow and assess whether a better parameter configuration is required or it should be substituted by another method

  • To explore the potential of this workflow, we finetuned these parameters in three ways: (i) stepby-step random configuration, where we used the methodology of [26] for independently optimizing each method until CNP9 and the F-Measure for optimizing the last two methods, (ii) holistic random configuration, whose goal is to maximize the overall F-Measure, and (iii) step-by-step grid configuration, where we used the same criteria as the first case

Read more

Summary

INTRODUCTION

Entity Resolution (ER) aims to detect di↵erent entity profiles that describe the same real-world objects [4]. 1http://aksw.org/Projects/LIMES.html and Silk are the most prominent representatives Most of these tools implement only the method(s) introduced by their creators, and/or are suitable for power users, requiring the manual configuration of matching rules, or a labeled dataset for learning such rules in a supervised way [14]. Another drawback is that none of them is applicable to structured data, while half of them lack a GUI [20]. The rest of the paper is structured as follows: Section 2 delves into JedAI’s architecture, Section 3 elaborates on the new features in version 2.1, Section 4 presents experiments that highlight the potential of JedAI, and Section 5 concludes the paper along with directions for future work

ARCHITECTURE
JedAI-core
JedAI-gui
EXPERIMENTS
DBLP Scholar
D2 datasets D3
Findings
CONCLUSIONS
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.