Reconciling Schema Matching Networks Through Crowdsourcing

Nguyen Quoc Viet Hung,Karl Aberer,Nguyen Thanh Tam,Zoltán Miklós

doi:10.4108/cc.1.2.e2

Abstract

Schema matching is the process of establishing correspondences between the attributes of database schemas for data integration purposes. Although several automatic schema matching tools have been developed, their results are often incomplete or erroneous. To obtain a correct set of correspondences, usually human e ort is required to validate the generated correspondences. This validation process is often costly, as it is performed by highly skilled experts. Our paper analyzes how to leverage crowdsourcing techniques to validate the generated correspondences by a large group of non-experts. In our work we assume tha t one needs to establish attribute correspondences not only between two schemas but in a network. W e also assume that the matching is realized in a pairwise f ashion, in the presence of consistency expectations about the network of attribute correspondences. We demonstrate that formulating these expectations in the form of integrity constraints can improve the process of reconciliation. As in the case of crowdsourcing the user’s input is unreliable, we need specific aggregation techniques to obtain good quality. We demonstrate that consistency constraints can not only improve the quality of aggregated answers, but they also enable us to more reliably estimate the quality answers of individual workers and detect spammers. Moreover, these constraints also enable to minimize the necessary human e ort needed, for the same expected quality of results.

Highlights

More and more online services enable users to upload and share structured data, including Google Fusion Tables [1], Freebase [2], and Factual [3]
In the following we introduce the schema matching network model [7] that we we will use in our work
We review salient work in schema matching and crowdsourcing areas that are related to our research

Summary

Introduction

More and more online services enable users to upload and share structured data, including Google Fusion Tables [1], Freebase [2], and Factual [3]. These services primarily offer easy visualization of uploaded data as well as tools to embed the visualization to blogs or Web pages. An example is the often quoted coffee consumption data found in Google Fusion Tables, which is distributed among different tables that represent a specific region [1]. Extraction of information over all regions requires means to query and aggregate across multiple tables, thereby raising the need of interconnecting schemas to achieve an integrated view of the data. The number of publicly available datasets grows rapidly, making the integration more and more challenging

Objectives

Methods

Results

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EAI Endorsed Transactions on Collaborative Computing	Publication Date: Oct 15, 2014
Citations: 5	License type: cc-by

R Discovery Prime

Reconciling Schema Matching Networks Through Crowdsourcing

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: EAI Endorsed Transactions on Collaborative Computing

Lead the way for us

Similar Papers

A method on lexical disambiguation in distributed heterogeneous autonomous database
Said Tahat ... Kamsuriah Ahmad
-
Said Tahat, et. al.Said Tahat ... Kamsuriah Ahmad
01 Nov 2013
01 Nov 2013

SMART: A tool for analyzing and reconciling schema matching networks
Quoc Viet Hung Nguyen ... Vinh Tuan Chau
-
Quoc Viet Hung Nguyen, et. al.Quoc Viet Hung Nguyen ... Vinh Tuan Chau
03 Nov 2014
03 Nov 2014

Comparison of Schema Matching Evaluations
Hong-Hai Do ... Erhard Rahm
-
Hong-Hai Do, et. al.Hong-Hai Do ... Erhard Rahm
01 Jan 2003
01 Jan 2003

Interpreting similarity measures: Bridging the gap between schema matching and data integration
Avigdor Gal
-
Avigdor GalAvigdor Gal
01 Apr 2008
01 Apr 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Reconciling Schema Matching Networks Through Crowdsourcing

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: EAI Endorsed Transactions on Collaborative Computing