Classification of Deep Web Databases Based on the Context of Web Pages

Jun Ma

doi:10.3724/sp.j.1001.2008.00267

Abstract

New techniques are discussed for enhancing the classification precision of deep Web databases, which include utilizing the content texts of the HTML pages containing the database entry forms as the context and a unification processing for the database attribute labels. An algorithm to find out the content texts in HTML pages is developed based on multiple statistic characteristics of the text blocks in HTML pages. The unification processing for database attributes is to let the attribute labels that are closed semantically be replaced with delegates. The domain and language knowledge found in learning samples is represented in hierarchical fuzzy sets and an algorithm for the unification processing is proposed based on the presentation. Based on the pre-computing a k-NN (k nearest neighbors) algorithm is given for deep Web database classification, where the semantic distance between two databases is calculated based on both the distance between the content texts of the HTML pages and the distance between database forms embedded in the pages. Various classification experiments are carried out to compare the classification results done by the algorithm with pre-computing and the one without the pre-computing in terms of classification precision, recall and F1 values.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Classification of Deep Web Databases Based on the Context of Web Pages

Abstract

Talk to us

Similar Papers

More From: Journal of Software

Lead the way for us

Journal: Journal of Software	Publication Date: Jul 9, 2008
Citations: 17

Similar Papers

Analysis and Comparison of Prediction of Heart Disease Using Novel K Nearest Neighbor and Decision Tree Algorithm
G Pavithraa ... S Sivaprasad
CARDIOMETRY | VOL. -
G Pavithraa, et. al.G Pavithraa ... S Sivaprasad
14 Feb 2023
CARDIOMETRY | VOL. -

Comparison of Nearest Neighbor and Caliper Algorithms in Outcome Propensity Score Matching to Study the Relationship between Type 2 Diabetes and Coronary Artery Disease
Sara Sabbaghian Tousi ... Ali Tagipour
Journal of Biostatistics and Epidemiology | VOL. -
Sara Sabbaghian Tousi, et. al.Sara Sabbaghian Tousi ... Ali Tagipour
03 Oct 2021
Journal of Biostatistics and Epidemiology | VOL. -

Facial Expression Recognition Using Improved Local Binary Pattern and Min-Max Similarity with Nearest Neighbor Algorithm
Narendra Mohan ... Neeraj Varshney
-
Narendra Mohan, et. al.Narendra Mohan ... Neeraj Varshney
02 Aug 2020
02 Aug 2020

An accuracy enhancement algorithm for fingerprinting method
Yuntian Brian Bai ... Allison Kealy
-
Yuntian Brian Bai, et. al.Yuntian Brian Bai ... Allison Kealy
01 Oct 2014
01 Oct 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Classification of Deep Web Databases Based on the Context of Web Pages

Abstract

Talk to us

Similar Papers

More From: Journal of Software