Learning to Integrate Web Taxonomies

Dell Zhang,Wee Sun Lee

doi:10.2139/ssrn.3199170

Abstract

We investigate machine learning methods for automatically integrating objects from different taxonomies into a master taxonomy. This problem is not only currently pervasive on the Web, but is also important to the emerging Semantic Web. A straightforward approach to automating this process would be to build classifiers through machine learning and then use these classifiers to classify objects from the source taxonomies into categories of the master taxonomy. However, conventional machine learning algorithms totally ignore the availability of the source taxonomies. In fact, source and master taxonomies often have common categories under different names or other more complex semantic overlaps. We introduce two techniques that exploit the semantic overlap between the source and master taxonomies to build better classifiers for the master taxonomy. The first technique, Cluster Shrinkage, biases the learning algorithm against splitting source categories by making objects in the same category appear more similar to each other. The second technique, Co-Bootstrapping, tries to facilitate the exploitation of inter-taxonomy relationships by providing category indicator functions as additional features for the objects. Our experiments with real-world Web data show that these proposed add-on techniques can enhance various machine learning algorithms to achieve substantial improvements in performance for taxonomy integration.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning to Integrate Web Taxonomies

Abstract

Talk to us

Similar Papers

More From: SSRN Electronic Journal

Lead the way for us

Similar Papers

Learning to integrate web taxonomies
Dell Zhang ... Wee Sun Lee
Web Semantics: Science, Services and Agents on the World Wide Web | VOL. 2
Dell Zhang, et. al.Dell Zhang ... Wee Sun Lee
11 Nov 2004
Web Semantics: Science, Services and Agents on the World Wide Web | VOL. 2

Web taxonomy integration using support vector machines
Dell Zhang ... Wee Sun Lee
-
Dell Zhang, et. al.Dell Zhang ... Wee Sun Lee
17 May 2004
17 May 2004

Enhancement of text categorization results via an ensemble learning technique
Wasf A Taha ... Suhad A Yousif
-
Wasf A Taha, et. al.Wasf A Taha ... Suhad A Yousif
01 Jan 2023
01 Jan 2023

Performance Analysis of Digit Recognizer Using Various Machine Learning Algorithms
Lakshmi Alekya Chittem ... Sheikh Sharfuddin Mim
-
Lakshmi Alekya Chittem, et. al.Lakshmi Alekya Chittem ... Sheikh Sharfuddin Mim
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning to Integrate Web Taxonomies

Abstract

Talk to us

Similar Papers

More From: SSRN Electronic Journal