DWSpyder: a new schema extraction method for a deep web integration system

Yasser Saissi,Ali Adri,Ahmed Zellou

doi:10.1504/ijwet.2019.102872

Abstract

The deep web is a huge part of the web that is not indexed by search engines. The deep web sources are accessible only through their associated access forms. We wish to use a web integration system to access the deep web sources and all of their information. To implement this web integration system, we need to know the schema description of each web source. The problem resolved in this paper is how to extract the schema describing an inaccessible deep web source. We propose our DWSpyder method as being able to extract the schema describing a deep web source despite its inaccessibility. The DWSpyder method starts with a static analysis of the deep web source access forms in order to extract the first elements of the associated schema description. The second step of our method is a dynamic analysis of these access forms using queries to enrich our schema description. Our DWSpyder method also uses a clustering algorithm to identify the possible values of deep web form fields with undefined sets of values. All of the information extracted is used by DWSpyder to generate automatically deep web source schema descriptions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DWSpyder: a new schema extraction method for a deep web integration system

Abstract

Talk to us

Similar Papers

More From: International Journal of Web Engineering and Technology

Lead the way for us

Similar Papers

Extraction of relational schema from deep web sources: a form driven approach
Yasser Saissi ... Ahmed Zellou
-
Yasser Saissi, et. al.Yasser Saissi ... Ahmed Zellou
01 Nov 2014
01 Nov 2014

Automatic Generation of Ontology from the Deep Web
Yoo Jung An ... James Geller
-
Yoo Jung An, et. al.Yoo Jung An ... James Geller
01 Sep 2007
01 Sep 2007

Deep Web Integration: the Tip of the Iceberg
Yasser Saissi ... A Idri
International Review on Computers and Software (IRECOS) | VOL. 10
Yasser Saissi, et. al.Yasser Saissi ... A Idri
31 Oct 2015
International Review on Computers and Software (IRECOS) | VOL. 10

Consolidating Web Application Server Farms with Redundant Webinterfaces
Minhwak Ok ... Yang-Soo Lee
-
Minhwak Ok, et. al.Minhwak Ok ... Yang-Soo Lee
01 Sep 2007
01 Sep 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DWSpyder: a new schema extraction method for a deep web integration system

Abstract

Talk to us

Similar Papers

More From: International Journal of Web Engineering and Technology