Utilizing WordNet and Regular Expressions for Instance-based Schema Matching

Ahmed Mounaf Mahdi,Sabrina Tiun

doi:10.19026/rjaset.8.994

Abstract

Instance-based matching is the process of finding the correspondence of schema elements by comparing the data from different data sources. It is used as an alternative option when the match between schema elements fails. Instance-based matching is applied in many application areas such as website creation and management, schema evolution and migration, data warehousing, database design and data integration. Sometimes the schema information such as (element name, description, data type, etc.) is unavailable or is unable to get the correct match especially when the element name is abbreviation, therefore, if the schema matching failed, the next step is to focus on values stored in the schemas. For these reasons, many recent approaches focus on instance-based matching. In this study, we propose an approach that combines the strength of pattern recognition utilizing regular expressions for numerical domain as well with WordNet for string domain by getting the similarity coefficient in the range of (0,1). In previous approach, the regular expression is achieved with a good accuracy for numerical instances only and is not implemented on string instances because we need to know the meaning of string to decide if there is a match or not. The using of WordNet-based measures for string instances should guarantee to improve the effectiveness in terms of Precision (P), Recall (R) and F-measure (F). This approach is evaluated with real dataset and the results are found better than using just equality measure for string especially if the schemas are disjoint. The approach achieved 95.3% F-measure (F).

Highlights

Database schema is a structure of database that describes the arrangement of its instances, relationships and constraints (Gillani et al, 2013).The application of database schema is important when it is required to integrate different database applications
Structural heterogeneity consists of type conflicts, dependency conflicts, key conflicts, or behavioral conflicts; whereas semantic heterogeneity includes semantic conflicts, which is the differences between the databases that are related to the semantic meaning and the planned meaning of data
The same work of degree [i] [j] that is used in matching by WordNet has been used here with name degreeReg and we find the maximum value of each row to select the correct correspondences from the list of candidates

Summary

Introduction

Database schema is a structure of database that describes the arrangement of its instances, relationships and constraints (Gillani et al, 2013). The application of database schema is important when it is required to integrate different database applications. The problem that will arise when we integrate two different databases is heterogeneity. This heterogeneity divided into two types: structural heterogeneity and semantic heterogeneity. Structural heterogeneity consists of type conflicts, dependency conflicts, key conflicts, or behavioral conflicts; whereas semantic heterogeneity includes semantic conflicts, which is the differences between the databases that are related to the semantic meaning and the planned meaning of data. In order to solve this heterogeneity problem, schema matching is needed (Gillani et al, 2013)

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Research Journal of Applied Sciences, Engineering and Technology	Publication Date: Jul 25, 2014
Citations: 33	License type: cc-by

R Discovery Prime

R Discovery Prime

Utilizing WordNet and Regular Expressions for Instance-based Schema Matching

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Research Journal of Applied Sciences, Engineering and Technology

Lead the way for us

Similar Papers

Instance based Matching using Regular Expression
Osama A Mehdi ... Lilly Suriani Affendey
Procedia Computer Science | VOL. 10
Osama A Mehdi, et. al.Osama A Mehdi ... Lilly Suriani Affendey
01 Jan 2012
Procedia Computer Science | VOL. 10

WebLens: Towards Web-scale Data Integration, Training the Models
Rituparna Khan ... Michael Gubanov
-
Rituparna Khan, et. al.Rituparna Khan ... Michael Gubanov
10 Dec 2020
10 Dec 2020

Data schema design as a schema evolution process
H.A Proper
Data & Knowledge Engineering | VOL. 22
H.A ProperH.A Proper
01 Apr 1997
Data & Knowledge Engineering | VOL. 22

Effect of thesaurus size on schema matching quality
Thabit Sabbah ... Tutut Herawan
Knowledge-Based Systems | VOL. 71
Thabit Sabbah, et. al.Thabit Sabbah ... Tutut Herawan
16 Aug 2014
Knowledge-Based Systems | VOL. 71

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Utilizing WordNet and Regular Expressions for Instance-based Schema Matching

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Research Journal of Applied Sciences, Engineering and Technology