Web Data Extraction Approach for Deep Web using WEIDJ

Ily Amalina Ahmad Sabri,Mustafa Man,Wan Aezwani Wan Abu Bakar,Ahmad Nazari Mohd Rose

doi:10.1016/j.procs.2019.12.124

Ily Amalina Ahmad Sabri, Mustafa Man + Show 2 more

Open Access

https://doi.org/10.1016/j.procs.2019.12.124

Copy DOI

Abstract

Data extraction is one of the most prominent areas in data mining analysis that is been extensively studied especially in the field of data requirements and reservoir. The main aim of data extraction with regards to semi-structured data is to retrieve beneficial information from the World Wide Web. The data from large web data also known as deep web is retrievable but it requires request through form submission because it cannot be performed by any search engines. Data mining applications and automatic data extraction are very cumbersome due to the diverse structure of web pages. Most of the previous data extraction techniques were dealing with various data types such as text, audio, video and etc. but research works that are focusing on image as data are still lacking. Document Object Model (DOM) is an example of the state of the art of data extraction technique that is related to research work in mining image data. DOM was the method used to solve semi-structured data extraction from web. However, as the HTML documents start to grow larger, it has been found that the process of data extraction has been plagued with lengthy processing time and noisy information. In this research work, we propose an improved model namely Wrapper Extraction of Image using DOM and JSON (WEIDJ) in response to the promising results of mining in a higher volume of web data from a various types of image format and taking the consideration of web data extraction from deep web. To observe the efficiency of the proposed model, we compare the performance of data extraction by different level of page extraction with existing methods such as VIBS, MDR, DEPTA and VIDE. It has yielded the best results in Precision with 100, Recall with 97.93103 and F-measure with 98.9547.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Procedia Computer Science	Publication Date: Jan 1, 2019
Citations: 8	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Web Data Extraction Approach for Deep Web using WEIDJ

Abstract

Talk to us

Similar Papers

More From: Procedia Computer Science

Lead the way for us

Similar Papers

WEIDJ: Development of a new algorithm for semi-structured web data extraction
Ily Amalina Ahmad Sabri ... Mustafa Man
TELKOMNIKA (Telecommunication Computing Electronics and Control) | VOL. 19
Ily Amalina Ahmad Sabri, et. al.Ily Amalina Ahmad Sabri ... Mustafa Man
01 Feb 2021
TELKOMNIKA (Telecommunication Computing Electronics and Control) | VOL. 19

A deep web data extraction model for web mining: a review
Ily Amalina Ahmad Sabri ... Mustafa Man
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 23
Ily Amalina Ahmad Sabri, et. al.Ily Amalina Ahmad Sabri ... Mustafa Man
01 Jul 2021
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 23

Improving Performance of DOM in Semi-structured Data Extraction using WEIDJ Model
Ily Amalina Ahmad Sabri ... Mustafa Man
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 9
Ily Amalina Ahmad Sabri, et. al.Ily Amalina Ahmad Sabri ... Mustafa Man
01 Mar 2018
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 9

Performance Analysis for Mining Images of Deep Web
Ily Amalina Ahmad Sabri ... Mustafa Man
International Journal of Advanced Computer Science and Applications | VOL. 11
Ily Amalina Ahmad Sabri, et. al.Ily Amalina Ahmad Sabri ... Mustafa Man
01 Jan 2020
International Journal of Advanced Computer Science and Applications | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Web Data Extraction Approach for Deep Web using WEIDJ

Abstract

Talk to us

Similar Papers

More From: Procedia Computer Science