A deep web data extraction model for web mining: a review

Ily Amalina Ahmad Sabri,Mustafa Man

doi:10.11591/ijeecs.v23.i1.pp519-528

Abstract

The World Wide Web has become a large pool of information. Extracting structured data from a published web pages has drawn attention in the last decade. The process of web data extraction (WDE) has many challenges, dueto variety of web data and the unstructured data from hypertext mark up language (HTML) files. The aim of this paper is to provide a comprehensive overview of current web data extraction techniques, in termsof extracted quality data. This paper focuses on study for data extraction using wrapper approaches and compares each other to identify the best approach to extract data from online sites. To observe the efficiency of the proposed model, we compare the performance of data extraction by single web page extraction with different models such as document object model (DOM), wrapper using hybrid dom and json (WHDJ), wrapper extraction of image using DOM and JSON (WEIDJ) and WEIDJ (no-rules). Finally, the experimentations proved that WEIDJ can extract data fastest and low time consuming compared to other proposed method.<br /><div> </div>

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Indonesian Journal of Electrical Engineering and Computer Science	Publication Date: Jul 1, 2021
Citations: 1	License type: CC BY-NC 4.0

R Discovery Prime

R Discovery Prime

A deep web data extraction model for web mining: a review

Abstract

Talk to us

Similar Papers

More From: Indonesian Journal of Electrical Engineering and Computer Science

Lead the way for us

Similar Papers

Web Data Extraction Approach for Deep Web using WEIDJ
Ily Amalina Ahmad Sabri ... Ahmad Nazari Mohd Rose
Procedia Computer Science | VOL. 163
Ily Amalina Ahmad Sabri, et. al.Ily Amalina Ahmad Sabri ... Ahmad Nazari Mohd Rose
01 Jan 2019
Procedia Computer Science | VOL. 163

Improving Performance of DOM in Semi-structured Data Extraction using WEIDJ Model
Ily Amalina Ahmad Sabri ... Mustafa Man
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 9
Ily Amalina Ahmad Sabri, et. al.Ily Amalina Ahmad Sabri ... Mustafa Man
01 Mar 2018
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 9

Effective Web data extraction with standard XML technologies
Jussi Myllymaki
Computer Networks | VOL. 39
Jussi MyllymakiJussi Myllymaki
11 Apr 2002
Computer Networks | VOL. 39

Parallel Approach and Platform for Large-Scale WEB Data Extraction
Shen Yi ... Chunfeng Yuan
-
Shen Yi, et. al.Shen Yi ... Chunfeng Yuan
01 Dec 2013
01 Dec 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A deep web data extraction model for web mining: a review

Abstract

Talk to us

Similar Papers

More From: Indonesian Journal of Electrical Engineering and Computer Science