Chapter 8 Context-Based Entity Matching for Big Data

Mayesha Tasnim,Maria-Esther Vidal,Diego Collarana,Damien Graux

doi:10.1007/978-3-030-53199-7_8

Mayesha Tasnim, Maria-Esther Vidal + Show 2 more

Open Access

https://doi.org/10.1007/978-3-030-53199-7_8

Copy DOI

Abstract

In the Big Data era, where variety is the most dominant dimension, the RDF data model enables the creation and integration of actionable knowledge from heterogeneous data sources. However, the RDF data model allows for describing entities under various contexts, e.g., people can be described from its demographic context, but as well from their professional contexts. Context-aware description poses challenges during entity matching of RDF datasets—the match might not be valid in every context. To perform a contextually relevant entity matching, the specific context under which a data-driven task, e.g., data integration is performed, must be taken into account. However, existing approaches only consider inter-schema and properties mapping of different data sources and prevent users from selecting contexts and conditions during a data integration process. We devise COMET, an entity matching technique that relies on both the knowledge stated in RDF vocabularies and a context-based similarity metric to map contextually equivalent RDF graphs. COMET follows a two-fold approach to solve the problem of entity matching in RDF graphs in a context-aware manner. In the first step, COMET computes the similarity measures across RDF entities and resorts to the Formal Concept Analysis algorithm to map contextually equivalent RDF entities. Finally, COMET combines the results of the first step and executes a 1-1 perfect matching algorithm for matching RDF entities based on the combined scores. We empirically evaluate the performance of COMET on testbed from DBpedia. The experimental results suggest that COMET accurately matches equivalent RDF graphs in a context-dependent manner.

Highlights

In the Big Data era, variety is one of the most dominant dimensions bringing new challenges for data-driven tasks
Grounded on the entity matching component from the data integration technique proposed by Collarana et al [84], we propose COMET, an entity matching approach to merge equivalent Resource Description Framework (RDF) entities based on context
We propose COMET, an approach to match contextually equivalent RDF graphs according to a given context, providing a solution to the problem of contextually matching RDF graphs

Summary

Introduction

In the Big Data era, variety is one of the most dominant dimensions bringing new challenges for data-driven tasks. Variety alludes to the types and sources of data that are becoming increasingly heterogeneous with new forms of data c The Author(s) 2020 V. At one point in time, the only source of digital data was spreadsheets and databases. Today data is collected from emails, photographs, digital documents, or audio. The variety of unstructured and semi-structured data creates issues during data analysis. These varying forms of data must be integrated for consistency in storage, mining, and analysis. The process of integrating these complex and semi-structured data poses its own set of challenges. The same real-world object may be represented in different data sources as different entities; it challenging to identify entities that refer to the same real-world object

Objectives

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Chapter 8 Context-Based Entity Matching for Big Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 3	License type: CC BY 4.0

Similar Papers

COMET: A Contextualized Molecule-Based Matching Technique
Mayesha Tasnim ... Damien Graux
-
Mayesha Tasnim, et. al.Mayesha Tasnim ... Damien Graux
01 Jan 2019
01 Jan 2019

MINTE
Diego Collarana ... Sören Auer
-
Diego Collarana, et. al.Diego Collarana ... Sören Auer
19 Jun 2017
19 Jun 2017

An ontology-based documentation of data discovery and integration process in cancer outcomes research
Hansi Zhang ... Yi Guo
BMC Medical Informatics and Decision Making | VOL. 20
Hansi Zhang, et. al.Hansi Zhang ... Yi Guo
01 Dec 2020
BMC Medical Informatics and Decision Making | VOL. 20

Entity matching across heterogeneous data sources: An approach based on constrained cascade generalization
Huimin Zhao ... Sudha Ram
Data & Knowledge Engineering | VOL. 66
Huimin Zhao, et. al.Huimin Zhao ... Sudha Ram
04 May 2008
Data & Knowledge Engineering | VOL. 66

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Chapter 8 Context-Based Entity Matching for Big Data

Abstract

Highlights

Summary

Talk to us

Similar Papers