VersaMatch: Ontology Matching with Weak Supervision

Jonathan Fürst,Bin Cheng,Mauricio Fadel Argerich

doi:10.14778/3583140.3583148

Abstract

Ontology matching is crucial to data integration for across-silo data sharing and has been mainly addressed with heuristic and machine learning (ML) methods. While heuristic methods are often inflexible and hard to extend to new domains, ML methods rely on substantial and hard to obtain amounts of labeled training data. To overcome these limitations, we propose VersaMatch , a flexible, weakly-supervised ontology matching system. VersaMatch employs various weak supervision sources, such as heuristic rules, pattern matching, and external knowledge bases, to produce labels from a large amount of unlabeled data for training a discriminative ML model. For prediction, VersaMatch develops a novel ensemble model combining the weak supervision sources with the discriminative model to support generalization while retaining a high precision. Our ensemble method boosts end model performance by 4 points compared to a traditional weak-supervision baseline. In addition, compared to state-of-the-art ontology matchers, VersaMatch achieves an overall 4-point performance improvement in F1 score across 26 ontology combinations from different domains. For recently released, in-the-wild datasets, VersaMatch beats the next best matchers by 9 points in F1. Furthermore, its core weak-supervision logic can easily be improved by adding more knowledge sources and collecting more unlabeled data for training.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the VLDB Endowment	Publication Date: Feb 1, 2023
Citations: 1	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

VersaMatch: Ontology Matching with Weak Supervision

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Lead the way for us

Similar Papers

A systematic machine learning method for reservoir identification and production prediction
Wei Liu ... Yuan Hu
Petroleum Science | VOL. 20
Wei Liu, et. al.Wei Liu ... Yuan Hu
01 Feb 2023
Petroleum Science | VOL. 20

P125. Development of a novel ensemble machine learning algorithm for prediction of complications and readmission after anterior cervical spinal fusion
Akash A Shah ... Nelson Soohoo
The Spine Journal | VOL. 21
Akash A Shah, et. al.Akash A Shah ... Nelson Soohoo
10 Aug 2021
The Spine Journal | VOL. 21

P126. Development of a novel ensemble machine learning algorithm for prediction of complications and readmission after posterior cervical spinal fusion
Akash A Shah ... Nelson Soohoo
The Spine Journal | VOL. 21
Akash A Shah, et. al.Akash A Shah ... Nelson Soohoo
10 Aug 2021
The Spine Journal | VOL. 21

A review of machine learning applications in wildfire science and management
Piyush Jain ... Mark Crowley
Environmental Reviews | VOL. 28
Piyush Jain, et. al.Piyush Jain ... Mark Crowley
28 Jul 2020
Environmental Reviews | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

VersaMatch: Ontology Matching with Weak Supervision

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment