Two-stage Detection of Semantic Redundancies in RDF Data

Yiming Chen,Daiyi Li,Li Yan,Zongmin Ma

doi:10.13052/jwe1540-9589.2184

Abstract

With the enrichment of the RDF (resource description framework), integrating diverse data sources may result in RDF data duplication. Failure to effectively detect the duplicates brings redundancies into the integrated RDF datasets. This not only increases unnecessarily the size of the datasets, but also reduces the dataset quality. Traditionally a similarity calculation is applied to detect if a pair of candidates contains duplicates. For massive RDF data, a simple similarity calculation may lead to extremely low efficiency. To detect duplicates in the massive RDF data, in this paper we propose a detection approach based on RDF data clustering and similarity measurements. We first propose a clustering method based on locality sensitive hashing (LSH), which can efficiently select candidate pairs in RDF data. Then, a similarity calculation is performed on the selected candidate pairs. We finally obtain the duplicate candidates. We show through experiments that our approach can quickly extract the duplicate candidates in RDF datasets. Our approach had the highest F score and time performance in the OAEI (Ontology Alignment Evaluation Initiative) 2019 competition.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Two-stage Detection of Semantic Redundancies in RDF Data

Abstract

Talk to us

Similar Papers

More From: Journal of Web Engineering

Lead the way for us

Similar Papers

RDF packages: a scheme for efficient reasoning and querying over large‐scale RDF data
Shohei Ohsawa ... Toshiyuki Amagasa
International Journal of Web Information Systems | VOL. 8
Shohei Ohsawa, et. al.Shohei Ohsawa ... Toshiyuki Amagasa
15 Jun 2012
International Journal of Web Information Systems | VOL. 8

Faceted fusion of RDF data
Wenqiang Liu ... Siyu Yao
Information Fusion | VOL. 23
Wenqiang Liu, et. al.Wenqiang Liu ... Siyu Yao
04 Jul 2014
Information Fusion | VOL. 23

BRGP: a balanced RDF graph partitioning algorithm for cloud storage
Yonglin Leng ... Fangming Zhong
Concurrency and Computation: Practice and Experience | VOL. 29
Yonglin Leng, et. al.Yonglin Leng ... Fangming Zhong
02 Aug 2016
Concurrency and Computation: Practice and Experience | VOL. 29

RDF Data-Centric Storage
Justin J Levandoski ... Mohamed F Mokbel
-
Justin J Levandoski, et. al.Justin J Levandoski ... Mohamed F Mokbel
01 Jul 2009
01 Jul 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Two-stage Detection of Semantic Redundancies in RDF Data

Abstract

Talk to us

Similar Papers

More From: Journal of Web Engineering